Closed jna29 closed 8 months ago
Hello,
This seems to be a bug indeed. In short term (I will not fix this in 2023) you can call:
forecast_obs = forecast_env.reset().copy()
forecast_obs._obs_env = forecast_obs._obs_env.copy()
EDIT: I now (after further investigation) believe this is not a good idea to do this. See my next post for more information.
Thanks this worked
Hello,
Upon further investigation I am not sure this is a bug from grid2op (though it sure looks like it).
What happens here is that when the "forecast environment" is deleted, all the resources it holds are closed. This is a normal / pythonic behaviour. But as grid2op might use some external libraries (coded in lower level languages such as c++ or even remotely) I chose to dive a bit deeper on the "ownership" of the attributes. A fancy word to say that "everything belong to the original environment, if you delete it, it deletes everything it possesses" (unless your force the copies of course, which is not the best solution I think now that I looked at it). This concept of "environment owns the objects and might delete them regardless of what is done elsewhere" is not really pythonic I fully agree. I'll document it somewhere.
If I understood your usecase, you want to store the forecasted observations.
Maybe you can use https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecast_arrays or https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecasted_inj
If you want to still be able to perform powerflows with what you have, you can also use the Simulator
class which is basically a super light environment that owns its simulator, see https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_simulator
If you detail a bit more your needs I might help you find the most efficient way to do it :-)
And, if you want minimal changes to the code you sent me, you can of course still keep a reference to each forecast env you creates (to avoid their deletion) with:
import grid2op
from lightsim2grid import LightSimBackend
env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()
class Manager:
obs_list = []
for_env_list = []
def foo():
obs = env.reset()
forecast_env = obs.get_forecast_env()
forecast_obs = forecast_env.reset()
Manager.obs_list.append(forecast_obs)
Manager.for_env_list.append(forecast_env)
print(f"forecast_obs: {forecast_obs}")
print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")
if __name__ == '__main__':
foo()
print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")
(I am not sure saving some data as class attribute in this way is a good think to do in general [just like global variables] it might lead to really hard to spot and debug wrong behaviour. I would advise against it)
So I need to store the forecasted observations to use forecast_obs.get_env_from_external_forecasts
later on in other functions and I need to make simulations from arbitrary states. The Simulator
class doesn't allow this as the returned observations can't be used later on to create another Simulator
(as far as I know). Also, I believe the Simulator.predict
method doesn't compute the reward of a (state, action) pair.
Hello,
A Simulator
can be copied and you can generate another "observation" from it if you want.
That's how it's designed to work if I remember correctly.
Though I agree, you don't get any reward from the simulator. Only an environment can compute it. That being said, you can compute something like a reward from the observation you got with the simulator.
So the issue is that I want to perform a prediction from the observation I get from Simulator.predict
but calling sim_stressed.current_obs.get_simulator()
where sim_stressed = obs.predict()
returns the error:
grid2op.Exceptions.observationExceptions.BaseObservationError: Grid2OpException BaseObservationError "Impossible to build a simulator is the observation space does not support it. This can be the case if the observation is loaded from disk or if the backend cannot be copied for example."
I'm not sure you need the observation for this.
You get a new simulator class (with its observation) each time you call simulator.predict
So basically your work flow could look like :
simulator_init = obs.get_simulator()
# to simulate action whatever on the current step
simulator1 = simulator.predict(act=whatever)
simulator2 = simulator.predict(act=wheteve2)
# to pass to next forecast step
forecats_horizon = 1
load_p, load_q, prod_p, prod_v, maintenance = obs.get_forecast_arrays()
simulator1_h1 = simulator.predict(act=whatever, new_gen_p=gen_p[forecast_horizon, :], new_load_p=same, etc. )
This, of I understood correctly should do what you want.
Sorry for the late reply.
So what you say would work for my use case other than the fact that I can't compute rewards and terminality (whether done=True
) without having access to an environment
I don't know your use case you did not really described it. Of if you did I missed it.
What I'm saying is that from what I understand maybe the Simulator would work.
You have access to done
flag there (not an output of the predict function but an attribute of the Simulator object). And also as far as I know every rewards (as far as I know) can be computed using the observation (and maybe past observations) in grid2op. I never encountered a reward that could not be computed from the actual state of the grid.
Not sure if it answers your non question. But now you have maybe more information to decide what you want to use and how.
Because at the end, if it works and you're happy about the solution then you can use whatever you want 😊
So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.
I inspected the Simulator object and I couldn't see any attribute that corresponds to the done
flag (I only saw backend
, converged
, current_obs
and error
) and the reward classes derived from BaseReward
need an Environment
object to be input to the BaseReward.__call__
method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__
). For the latter, are these the correct reward functions to look at?
Hello,
So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.
Thanks for describing me your usecase. It's clearer now.
Some people have already applied with success MCTS to grid2op so i'm confident it's doable.
And if you find that ForecastedEnv
works for you then no issue at all with me :-)
I inspected the Simulator object and I couldn't see any attribute that corresponds to the done flag (I only saw backend, converged, current_obs and error)
converged
should match the flag done
.
The main difference for the simulator is that it will not check the validity of the action. It will do the action even though "in reality" this action would be illegal.
eward classes derived from BaseReward need an Environment object to be input to the BaseReward.call method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__). For the latter, are these the correct reward functions to look at?
Yes these are.
Notice I did not say "currently the reward is computed without the environment". I said that for the vast majority of reward, in fact you can just use an observation.
For example, for EconomicReward
you have:
def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
if has_error or is_illegal or is_ambiguous:
res = self.reward_min
else:
# compute the cost of the grid
res = dt_float((env.get_obs(_do_copy=False).prod_p * env.gen_cost_per_MW).sum() * env.delta_time_seconds / 3600.0)
# we want to minimize the cost by maximizing the reward so let's take the opposite
res *= dt_float(-1.0)
# to be sure it's positive, add the highest possible cost
res += self.worst_cost
res = np.interp(
res, [dt_float(0.0), self.worst_cost], [self.reward_min, self.reward_max]
)
return dt_float(res)
This reward litterally calls "env.get_obs()" so it does not use anything from the environment that would be hidden in the observation.
Another example, the EpisodeDurationReward
for which you have the implementation
def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
if is_done:
res = env.nb_time_step
if np.isfinite(self.total_time_steps):
res /= self.total_time_steps
else:
res = self.reward_min
return res
Here it uses env.nb_time_step
which is the same as https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.current_step and also self.total_time_steps which is initialized a bit above (in the actual code, not in the piece of code there) with https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.max_step
So again no issues to use an observation in this case. And I could continue for most rewards available in grid2op.
Thanks for clarifying my questions.
However, I don't understand why the reward functions require as input an Environment
? If it only depends on the observation wouldn't it make sense to just have an observation be an input instead of an environment?
Because in the RL community, in theory you can design some rewards that uses part of the environment not shown in the observation. If you look at Markov Decision Process (MDP) and especially Partially Observable Markov Decision Process (POMDP) then you notice the reward kernal takes as input the state of the environment (and not the observation) and the action.
Beside you have settings in which it would not make real sense to use the observation. For example if you consider noisy observation. Or a multi agent setting.
I see. I'll try and modify the reward function for my purpose as I want to use reward functions that only use the observation
Let me know if you encounter any issues :-)
Environment
1.9.7
Springdale Open Enterprise Linux 9.2 (Parma)
Bug description
Appending an
Observation
object returned fromforecast_obs = obs.forecast_env().reset()
to a list class attribute in a function results inforecast_obs
being garbage collected when we the function is exited. I believe that this is due to the fact that__del__
is being called onforecast_obs._obs_env
when the functionfoo
below is exited.The error still persists if we remove
LightSimBackend
(i.e. use the default backend) or change the environment or if we copyforecast_obs
before adding it to the list.How to reproduce
Command line
Code snippet
Current output
Expected output
The backend should still remain even after we exit
foo