EmiliyanGospodinov / LILAC

Deep Reinforcement Learning amidst Continual Structured Non-Stationarity
2 stars 0 forks source link

Help needed #1

Open prinshul opened 1 month ago

prinshul commented 1 month ago

Can you please let me understand few things.

  1. Why wind_frc and _target_vel are added in the observation space in _get_obs? This will probably change the dim of observation space.

  2. Why wind_frc is multiplied by 0.01 in observation space in _get_obs function but when it's assigned to action[-1] it's not multipled by 0.01?

  3. The sine function will change the direction too when wind_frc is changed so it will change the direction of torque by action [-1] = wind.frc Is this desirable?

  4. Why only last action is changed which is front foot? This action [-1] = wind.frc will only change action of the last foot in the action space as per my understanding.

  5. Is changing action space like this allowed as action ranges are [-1, 1]? Because the wind_frc is varying in the range [-10, 10].

EmiliyanGospodinov commented 20 hours ago

I'm sorry for the delayed answer, I had a lot of stuff on my plate. In case you are using these environments please stop, during my master's thesis I found some bugs, so I implemented new ones. In case you are still interested in non-stationary environments, write me a pn and I can help you. Answers to your questions:

  1. Including wind friction and/or target velocity in the observation space has the purpose of analyzing whether different RL agents can use this additional information and perform better (interestingly, this is not always the case), in some cases, the environmental change is assumed to be known and so you can add it to the agent's input to create an oracle agent. Adding those will affect the observation dimension for sure.

  2. This is something that also does not make sense to me, so in case you use the environments just remove the factor 0.01.

  3. Yes, since wind can act in the movement direction as well as against the movement direction of the agent. In this case, you have only those two directions as the HalfCheetah moves in a 1d-line and not a 2d-space.

  4. The way how the wind friction is defined in the LILAC-paper environment is to add a motor that is responsible for the realization of the wind ( please check the half_cheetah_wind.xml model at the end), so the last action dimension does not correspond to any joint of the robot, but to the additional motor to realize torques that act on a particular joint (in this case rootx). I spoke with some of the mujoco-developers and they also found this implementation also inappropriate.

  5. Yes, you can change the wind friction like that because you implement it as a control-limited motor. The original actions are realized as other types of gears that map those actions from the range [-1,1] to the appropriate torque range which is different for the different joints, please check the half_cheetah_wind.xml model and the mujoco documentation for the different actuator types.