Garbage collection bug for forecast observation

jna29 commented 11 months ago

Environment

Grid2op version: 1.9.7
System: Springdale Open Enterprise Linux 9.2 (Parma)

Bug description

Appending an Observation object returned from forecast_obs = obs.forecast_env().reset() to a list class attribute in a function results in forecast_obs being garbage collected when we the function is exited. I believe that this is due to the fact that __del__ is being called on forecast_obs._obs_env when the function foo below is exited.

The error still persists if we remove LightSimBackend (i.e. use the default backend) or change the environment or if we copy forecast_obs before adding it to the list.

How to reproduce

Command line

Code snippet

import grid2op
from lightsim2grid import LightSimBackend

env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()

class Manager:
    obs_list = []

def foo():
    obs = env.reset()
    forecast_env = obs.get_forecast_env()
    forecast_obs = forecast_env.reset()
    Manager.obs_list.append(forecast_obs)
    print(f"forecast_obs: {forecast_obs}")
    print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

if __name__ == '__main__':
    foo()
    print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

Current output

forecast_obs: <grid2op.Space.GridObjects.ObservationWCCI2022_l2rpn_wcci_2022 object at 0x7f2f7e5f3e80>    
[FOO] Manager.obs_list[0]._obs_env: <grid2op.Environment._obsEnv._ObsEnv_l2rpn_wcci_2022 object at 0x7f2f7e
5f3ee0>                                                                                                   
[FOO] Manager.obs_list[0]._obs_env.backend is None: False                                                  
Manager.obs_list[0]._obs_env: <grid2op.Environment._obsEnv._ObsEnv_l2rpn_wcci_2022 object at 0x7f2f7e5f3ee0
>                                                                                                          
Manager.obs_list[0]._obs_env.backend is None: True

Expected output

Manager.obs_list[0]._obs_env.backend is None: False

The backend should still remain even after we exit foo

BDonnot commented 10 months ago

Hello,

This seems to be a bug indeed. In short term (I will not fix this in 2023) you can call:

forecast_obs = forecast_env.reset().copy()
forecast_obs._obs_env = forecast_obs._obs_env.copy()

EDIT: I now (after further investigation) believe this is not a good idea to do this. See my next post for more information.

jna29 commented 10 months ago

Thanks this worked

BDonnot commented 10 months ago

Hello,

Upon further investigation I am not sure this is a bug from grid2op (though it sure looks like it).

What happens here is that when the "forecast environment" is deleted, all the resources it holds are closed. This is a normal / pythonic behaviour. But as grid2op might use some external libraries (coded in lower level languages such as c++ or even remotely) I chose to dive a bit deeper on the "ownership" of the attributes. A fancy word to say that "everything belong to the original environment, if you delete it, it deletes everything it possesses" (unless your force the copies of course, which is not the best solution I think now that I looked at it). This concept of "environment owns the objects and might delete them regardless of what is done elsewhere" is not really pythonic I fully agree. I'll document it somewhere.

If I understood your usecase, you want to store the forecasted observations.

Maybe you can use https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecast_arrays or https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecasted_inj

If you want to still be able to perform powerflows with what you have, you can also use the Simulator class which is basically a super light environment that owns its simulator, see https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_simulator

If you detail a bit more your needs I might help you find the most efficient way to do it :-)

And, if you want minimal changes to the code you sent me, you can of course still keep a reference to each forecast env you creates (to avoid their deletion) with:

import grid2op
from lightsim2grid import LightSimBackend

env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()

class Manager:
    obs_list = []
    for_env_list = []

def foo():
    obs = env.reset()
    forecast_env = obs.get_forecast_env()
    forecast_obs = forecast_env.reset()
    Manager.obs_list.append(forecast_obs)
    Manager.for_env_list.append(forecast_env)
    print(f"forecast_obs: {forecast_obs}")
    print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

if __name__ == '__main__':
    foo()
    print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

(I am not sure saving some data as class attribute in this way is a good think to do in general [just like global variables] it might lead to really hard to spot and debug wrong behaviour. I would advise against it)

jna29 commented 10 months ago

So I need to store the forecasted observations to use forecast_obs.get_env_from_external_forecasts later on in other functions and I need to make simulations from arbitrary states. The Simulator class doesn't allow this as the returned observations can't be used later on to create another Simulator (as far as I know). Also, I believe the Simulator.predict method doesn't compute the reward of a (state, action) pair.

BDonnot commented 10 months ago

Hello,

A Simulator can be copied and you can generate another "observation" from it if you want.

That's how it's designed to work if I remember correctly.

Though I agree, you don't get any reward from the simulator. Only an environment can compute it. That being said, you can compute something like a reward from the observation you got with the simulator.

jna29 commented 10 months ago

So the issue is that I want to perform a prediction from the observation I get from Simulator.predict but calling sim_stressed.current_obs.get_simulator() where sim_stressed = obs.predict() returns the error:

grid2op.Exceptions.observationExceptions.BaseObservationError: Grid2OpException BaseObservationError "Impossible to build a simulator is the observation space does not support it. This can be the case if the observation is loaded from disk or if the backend cannot be copied for example."

BDonnot commented 10 months ago

I'm not sure you need the observation for this.

You get a new simulator class (with its observation) each time you call simulator.predict

So basically your work flow could look like :


simulator_init = obs.get_simulator()
# to simulate action whatever on the current step 
simulator1 = simulator.predict(act=whatever) 
simulator2 = simulator.predict(act=wheteve2) 

# to pass to next forecast step 
forecats_horizon = 1
load_p, load_q, prod_p, prod_v, maintenance = obs.get_forecast_arrays()

simulator1_h1 = simulator.predict(act=whatever, new_gen_p=gen_p[forecast_horizon, :], new_load_p=same, etc. )

This, of I understood correctly should do what you want.

jna29 commented 9 months ago

Sorry for the late reply.

So what you say would work for my use case other than the fact that I can't compute rewards and terminality (whether done=True) without having access to an environment

BDonnot commented 9 months ago

I don't know your use case you did not really described it. Of if you did I missed it.

What I'm saying is that from what I understand maybe the Simulator would work.

You have access to done flag there (not an output of the predict function but an attribute of the Simulator object). And also as far as I know every rewards (as far as I know) can be computed using the observation (and maybe past observations) in grid2op. I never encountered a reward that could not be computed from the actual state of the grid.

Not sure if it answers your non question. But now you have maybe more information to decide what you want to use and how.

Because at the end, if it works and you're happy about the solution then you can use whatever you want 😊

jna29 commented 9 months ago

So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.

I inspected the Simulator object and I couldn't see any attribute that corresponds to the done flag (I only saw backend, converged, current_obs and error) and the reward classes derived from BaseReward need an Environment object to be input to the BaseReward.__call__ method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__). For the latter, are these the correct reward functions to look at?

BDonnot commented 9 months ago

Hello,

So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.

Thanks for describing me your usecase. It's clearer now.

Some people have already applied with success MCTS to grid2op so i'm confident it's doable.

And if you find that ForecastedEnv works for you then no issue at all with me :-)

I inspected the Simulator object and I couldn't see any attribute that corresponds to the done flag (I only saw backend, converged, current_obs and error)

converged should match the flag done.

The main difference for the simulator is that it will not check the validity of the action. It will do the action even though "in reality" this action would be illegal.

eward classes derived from BaseReward need an Environment object to be input to the BaseReward.call method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__). For the latter, are these the correct reward functions to look at?

Yes these are.

Notice I did not say "currently the reward is computed without the environment". I said that for the vast majority of reward, in fact you can just use an observation.

For example, for EconomicReward you have:

    def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
        if has_error or is_illegal or is_ambiguous:
            res = self.reward_min
        else:
            # compute the cost of the grid
            res = dt_float((env.get_obs(_do_copy=False).prod_p * env.gen_cost_per_MW).sum() * env.delta_time_seconds / 3600.0)
            # we want to minimize the cost by maximizing the reward so let's take the opposite
            res *= dt_float(-1.0)
            # to be sure it's positive, add the highest possible cost
            res += self.worst_cost

        res = np.interp(
            res, [dt_float(0.0), self.worst_cost], [self.reward_min, self.reward_max]
        )
        return dt_float(res)

This reward litterally calls "env.get_obs()" so it does not use anything from the environment that would be hidden in the observation.

Another example, the EpisodeDurationReward for which you have the implementation

    def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
        if is_done:
            res = env.nb_time_step
            if np.isfinite(self.total_time_steps):
                res /= self.total_time_steps
        else:
            res = self.reward_min
        return res

Here it uses env.nb_time_step which is the same as https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.current_step and also self.total_time_steps which is initialized a bit above (in the actual code, not in the piece of code there) with https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.max_step

So again no issues to use an observation in this case. And I could continue for most rewards available in grid2op.

jna29 commented 9 months ago

Thanks for clarifying my questions.

However, I don't understand why the reward functions require as input an Environment? If it only depends on the observation wouldn't it make sense to just have an observation be an input instead of an environment?

BDonnot commented 9 months ago

Because in the RL community, in theory you can design some rewards that uses part of the environment not shown in the observation. If you look at Markov Decision Process (MDP) and especially Partially Observable Markov Decision Process (POMDP) then you notice the reward kernal takes as input the state of the environment (and not the observation) and the action.

Beside you have settings in which it would not make real sense to use the observation. For example if you consider noisy observation. Or a multi agent setting.

jna29 commented 9 months ago

I see. I'll try and modify the reward function for my purpose as I want to use reward functions that only use the observation

BDonnot commented 8 months ago

Let me know if you encounter any issues :-)

Grid2op / grid2op