Open belerico opened 4 months ago
@michele-milesi One problem is the obs normalization statistics: if one wants to test an algo trained with normalized obs then (s)he needs to also apply the same statistics to the test env I suppose. A simple solution would be to pass the same env used for training to the test function, but this does not solve the offline test. To solve that we should maybe save also the RunningMeanStd. What do you think?
Another issue: the obs and rewards normalization is done per-env since the wrappers are created inside the make_env
method, then called in the agent code by the SyncVectorEnv
or AsyncVectorEnv
. Do we want to maintain the normalization independent per env or do we want to apply the normalization on the overall vector-env?
For reference
I was thinking of creating custom normalizers with a standard format (that works with numpy arrays), e.g., a class that must define 3 methods:
__call__()
or normalize()
for applying the normalization.state_dict()
to save the state of the normalizer.load_state_dict()
is used to load the state of the normalizer (as for torch modules).I propose to normalize the observations returned by the env.step()
function, in this way, we do not have a normalizer for each environment, but one global normalizer.
@belerico what do you think?
Summary
This PR adds various env wrappers:
Dict
spacesType of Change
Please select the one relevant option below:
Checklist
Please confirm that the following tasks have been completed:
Screenshots or Visuals (Optional)
If applicable, please provide screenshots, diagrams, graphs, or videos of the changes, features or the error.
Additional Information (Optional)
Please provide any additional information that may be useful for the reviewer, such as:
Thank you for your contribution! Once you have filled out this template, please ensure that you have assigned the appropriate reviewers and that all tests have passed.