conglu1997 / v-d4rl

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations
MIT License
94 stars 9 forks source link

Visual Distraction Experiment #7

Closed gunnxx closed 1 year ago

gunnxx commented 1 year ago

Hi, I have two questions.

  1. How to reproduce the experiments in Section 5.1? Especially, the part where it uses different percentage of data shifts.
  2. I am a bit confused with the amount of data used. In paragraph 5 first sentence of Section 5.1, it says that for cheetah-run medium-expert it uses 1M datapoints. Aren't there only 200K datapoints for that task?

Anyway, thank you for the nice codebase!

conglu1997 commented 1 year ago

Hi, thank you for your questions!

On 1. I simply mixed the files of the original and distracted datasets in the correct proportion. For e.g. you could use conversion_scripts/split_hdf5_shards.py to split the hdf5s. Or you could mix them in Python with some extra command line arguments. On 2. These experiments are investigating data scaling and thus use additional data from the same distribution, these expanded datasets are not part of the main benchmark.

Let me know if you have any further questions!

gunnxx commented 1 year ago

I see. Are you able to share the codebase to create the dataset? I want to try other types of distractions. I can build on top of pytorch_sac but I am afraid that I might miss some details and make the dataset different from vd4rl.

conglu1997 commented 1 year ago

Hey! It's precisely the script in https://github.com/philipjball/SAC_PyTorch/blob/dmc_branch/gather_offline_data.py. There will be options to choose distractions, you may choose the appropriate size of image as well. It will be the same base transition using the same seed=0.

gunnxx commented 1 year ago

Nice, thanks a lot!

gunnxx commented 1 year ago

Hey, sorry for asking a lot ㅠㅠ but upon reading the code, wrap_distracting function is not provided and is not exactly matching with any functions in distracting control.

I am lost in the details. The details about low, moderate, and high level distraction are not provided. I tried to inspect several episodes (low, moderate, and high are atatched respectively). I came to the conclusion that each level use each of their own fixed background, inclination, and color. Aren't they? If so, is it possible to get the exact setup (background, inclination, and color) to evaluate the offline rl agent? My understanding is that the offline rl agent was trained on the "distracted" dataset but evaluated on the "clean" environment. easy medium hard

conglu1997 commented 1 year ago

Hi, please see this: https://github.com/conglu1997/v-d4rl/blob/29d0960923b634a0b149d4312e18460d49fbeb08/envs/distracting_control/suite.py#L141 for the wrapper. You are right, they are provided in that function.

gunnxx commented 1 year ago

I see, thanks again!

gunnxx commented 1 year ago

Hi, do you happen to save the policy checkpoint to generate the dataset?

conglu1997 commented 1 year ago

Unfortunately not, you can obtain it with https://github.com/philipjball/SAC_PyTorch/blob/dmc_branch/train_agent.py.