ARISE-Initiative / robosuite

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
https://robosuite.ai
Other
1.34k stars 413 forks source link

Regression in training a RL agent with v1.0 branch - rewards are significantly lower #92

Closed gal-leibovich closed 4 years ago

gal-leibovich commented 4 years ago

Hi,

Lately, I have started looking into the v1.0 branch. Generally it looks very good, and the refactoring being made is making it much easier to work with Robosuite. Also, all the newly added robots are very neat. Kudos! :-)

I did notice a regression in running one of my tests training a simple RL agent on the v1.0 branch, though. Specifically, starting with commit da27201, which is enforcing simulation reset with all the environments here (btw it would have made sense to get it out of the if-else clause, as it is being executed in both cases) - https://github.com/StanfordVL/robosuite/blob/92c398f321c1837eb86601859515a5721a7eb2d9/robosuite/environments/base.py#L158

Now, it's worth mentioning that I am easily training an agent with master or with the v1.0 (up until before that commit) or with v1_dr branches.

The regression in rewards can be reproduced by running the following snippet -

from robosuite.controllers import load_controller_config
from robosuite.utils.input_utils import *

env = suite.make(
    robots=['Panda'],
    controller_configs=load_controller_config(default_controller='IK_POSE'),
    env_name='Lift',
    has_renderer=False,
    ignore_done=True,
    use_camera_obs=False,
    control_freq=20,
    reward_shaping=True
)
low, high = env.action_spec

episode_rewards = []
for _ in range(10):
    env.reset()
    sum_rewards = 0
    for i in range(200):
        action = np.random.uniform(low, high)
        obs, reward, done, _ = env.step(action)
        sum_rewards += reward
    episode_rewards.append(sum_rewards)
print(np.mean(episode_rewards))

Running this snippet before the enforced simulation reset that was added would return episode rewards of ~1, while after the change this would return rewards of ~0.2.

cremebrule commented 4 years ago

Thanks for bringing up this issue. Can you try again now? I'm running your testing script but I'm getting consistent averaged episode rewards of >1 (even up to 2-3).

The redundantself.sim.reset() call is there as a placeholder while we debug the mujoco viewer, which is currently preventing us from being able to actually execute a hard reset as intended.

gal-leibovich commented 4 years ago

Hi @cremebrule,

Sure, I would be happy to try again but as I wrote on that other issue, it seems that someone has deleted the v1.0 branch? Was this intended?

yukezhu commented 4 years ago

Hi @gal-leibovich We moved the v1.0 branch to private for access control prior to the public release. We are in active discussions to figure out the best protocol of sharing a pre-release with limited external users, including you. As you have access to a copy of the v1.0 branch, you are more than welcome to submit any further issues that you spot. We really appreciate your help with testing the v1.0 branch. Thanks!

gal-leibovich commented 4 years ago

Hi @yukezhu, thanks a lot for being so open and sharing!

As I'm actively working and developing on top of the v1.0 branch, I would be happy to continue to test it out and submit issues and/or PRs to help out with the development, as I really appreciate the great work that you guys are doing.

If possible, I would be very happy to get access to the pre-release repo so to continue getting the latest bug-fixes, testing new commits as these are pushed and submitting PRs/issue where appropriate.

cremebrule commented 4 years ago

closing this for now -- once we get v1.0 released, feel free to open up another issue if you're still having these regression issues!