PKU-Alignment / omnisafe

JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
https://www.omnisafe.ai
Apache License 2.0
946 stars 132 forks source link

How to introduce our environment? #285

Closed charleswangyanan closed 7 months ago

charleswangyanan commented 1 year ago

Required prerequisites

Questions

Question1: I just want used our own environment, and find the omnisafe-main/tests/simple_env.py, so attempt to modify the simple_env into our own environment. However, when run the omnisafe-main/examples/train_policy.py, the simple_env couldn't be introduced?What shoule I use to train the simple_env.py? image image

Question2: I learn from issue #273#263#255 and the github README.md, find the following, could you give a specific environment example?It is the same with omnisafe-main/tests/simple_env? Thank you very much! image

NoemiF99 commented 11 months ago

@NoemiF99 You can:

  • register info in the step function of your customized environment file. for example: {number_of_collisions: 10} (10 is the supposed number of collisions).
  • register variable _ep_num_colli in omnisafe/adapter/onpolicy_adapter.py then log it from info (since you have log it in the step 1) in the method _log_value, _log_metrics and _reset_log. Just following what we do to log reward/cost.
  • Suppose you name the number of collisions key as Metrics/EpColi, you can make corresponding change in policy gradient.py to log it just like how we log Metrics/EpRet.

Thank you so much for the support and advice you provided. They were very helpful and allowed me to input the data as I wanted. You have been very kind.

Gaiejj commented 11 months ago

@tjruan The inclination of omnisafe's gaussian_learning_actor to utilize Gaussian distribution, a common technique in handling continuous action space tasks in reinforcement learning algorithms, accounts for this. The ActionScale wrapper solely focuses on scaling the mean of the Gaussian distribution into a specified field, but it does not ensure that actions sampled from this distribution strictly obey constraints. For example, with a mean of 0.1 in a Gaussian distribution, a sampled action could still potentially yield a -0.1. If you are keen on ensuring this specific action aligns with the action space prerequisites, you might consider executing additional clip operations on the action as suggested in the official documents of the gymnasium.

NoemiF99 commented 11 months ago

@NoemiF99 You can:

  • register info in the step function of your customized environment file. for example: {number_of_collisions: 10} (10 is the supposed number of collisions).
  • register variable _ep_num_colli in omnisafe/adapter/onpolicy_adapter.py then log it from info (since you have log it in the step 1) in the method _log_value, _log_metrics and _reset_log. Just following what we do to log reward/cost.
  • Suppose you name the number of collisions key as Metrics/EpColi, you can make corresponding change in policy gradient.py to log it just like how we log Metrics/EpRet.

Hello, I would like to ask a question. After inserting the number of collisions in the _log_metrics, _log_value, and _log_reset functions in the onpolicy_adapter.py file and recording my data within the _init_log function of policy_gradient by saving the number of collisions like this: self._logger.register_key('Metrics/EpNumCollisions'). I don't understand why it calculates an average of the number of collisions instead of returning the integer value for each epoch. Which function should I modify in this case to ensure that the saved value is the integer number? Thank you very much in advance for your help

Gaiejj commented 11 months ago

Set the steps_per_epoch to the same as the episode length of your environment may help.

charleswangyanan commented 11 months ago

We need to implement decisions at two time scales, such as 1 time step decision and 4 time step decision making. So we need to establish two sets of state, action, reward functions. There is a coupling relationship between the variables of the two time scales, so it is more convenient to implement it in a Class environment.

How to set up two state spaces? Such as the following example, but this example is wrong.

 from gym.spaces import Box, Dict

self._observation_space = Dict({
            'obs1': Box(low=0, high=1, shape=(5,), dtype=np.float32),
            'obs2': Box(low=0, high=1, shape=(12,), dtype=np.float32)
        })`

Error is "AssertionError: Observation space must be Box". How to set up two state spaces? Thank you.

Gaiejj commented 11 months ago

This issue pertains to multi-agent safety reinforcement learning, which is currently unsupported by OmniSafe. The following code base might be of assistance:

NoemiF99 commented 10 months ago

Issue with Video Saving During Training I am currently using your code and would like to bring to your attention an issue I am experiencing during training. Currently, the video saving message is displayed correctly in the terminal as follows:

################################################## Saving the replay video to ./runs/PCPO-{Custom1-v0}/seed-000-2024-01-20-16-11-47/video/epoch-100, and the result to ./runs/PCPO-{Custom1-v0}/seed-000-2024-01-20-16-11-47/video/epoch-100/result.txt. ################################################## However, despite this message, the video is not actually being saved in the designated folder. Instead, only the results are being saved, and I cannot identify the reason for this behavior.

I have checked that the folder structure is correct and there are no obvious errors, but the video is still not appearing. I would like to understand if there is any specific step that could be causing this issue or if there is something I could modify in the code to ensure the proper saving of the video.

Thank you in advance for your help and assistance. I am available to provide further details.

vrn-sn commented 10 months ago

However, despite this message, the video is not actually being saved in the designated folder. Instead, only the results are being saved, and I cannot identify the reason for this behavior.

@NoemiF99, my team was having a similar issue when running a video-saving script from inside a Docker container. It seemed to be failing silently, as you described. We used the xvfb utility to solve this: instead of running python3 my_file.py, we ran xvfb-run -a python3 my_file.py.

Gaiejj commented 7 months ago

Closed due to no further questions. You're welcome to reopen it if you happen to have more issues.