TrentBrick / RewardConditionedUDRL

Open source code combining implementations of Upside Down Reinforcement Learning and Reward Conditioned Policies
MIT License
16 stars 3 forks source link

Issues when switching game #1

Open Baldins opened 3 years ago

Baldins commented 3 years ago

Hi!

First of al, wonderful code!

I want to use this as a base for a research project I am working on. So I was first staring by studying your implementation - I have my own gym environment and would love to adapt it to it.

But first, I just trying to make it work with other game you mentioned in the gym_params.py - however I am having some issues when I try to run the code with any game that is not the lunar lander -- it seems like some parameters used by the trainer.py are missing in the other games parameters set. Should just add those lines in the env_params, or should I comment some lines in the trainer.py ? I believe you already have the code working on all the environment, so i thought that it would be safer to just ask you directly, rather than trying to fix it by myself.

Thank you in advance for your help!

TrentBrick commented 3 years ago

Hi @Baldins thanks for your question. I haven't try running any of the code with the pendulum environment for a while so will take a look at this soon and get back to you!

Baldins commented 3 years ago

Thank you! :D

TrentBrick commented 3 years ago

Ok there were a few extra environment parameters that I had set for lunar lander but not for pendulum. I have added these for pendulum and the code all seems to run. Pull the new push that I just made for these updates.

I don't currently have the time to check if the algorithm actually learns anything useful for pendulum -- if you run it please let me know. Also I assumed the average pendulum rollout was 200 frames but this should be checked and or tuned.

Do take the time to understand the environment parameters esp if you are changing your environment. And if the docs or configuration of these parameters could be improved don't hesitate to submit a pull request!

Closing this issue for now but if you test pendulum do re open it :)

Baldins commented 3 years ago

It is giving me some error when I run it - but it seems like now that is happening also for the lunarlender environment too -- not sure exactly why but I'll try to figure it out!

And sounds good ! I am planning to use it on my own environment too, so I might going to reach out again in case I have some question about the code!

Thank you!

TrentBrick commented 3 years ago

Huh can you copy and paste the error message?

Baldins commented 3 years ago

So, this is what I am getting with lunar lander -- it's not really an error but it remains stuck there while yesterday I was seeing the pop up simulation starting

            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: Checkpoint directory exp_dir/lunarlander/debug/UDRL/seed_25 exists and is not empty. With save_top_k=1, all files in this directory will be deleted when a checkpoint is saved!
              warnings.warn(*args, **kwargs)
            GPU available: True, used: False
            TPU available: False, using: 0 TPU cores
            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
              warnings.warn(*args, **kwargs)
            2021-03-10 18:19:30.518073: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
            2021-03-10 18:19:30.519393: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
            Invalid MIT-MAGIC-COOKIE-1 key
              | Name  | Type      | Params
            ------------------------------------
            0 | model | UpsdHyper | 113 K 
            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
              warnings.warn(*args, **kwargs)
            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: The validation_epoch_end should not return anything as of 9.1.to log, use self.log(...) or self.write(...) directly in the LightningModule
              warnings.warn(*args, **kwargs)
            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: The {log:dict keyword} was deprecated in 0.9.1 and will be removed in 1.0.0
            Please use self.log(...) inside the lightningModule instead.

            # log on a step or aggregate epoch metric to the logger and/or progress bar
            # (inside LightningModule)
            self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
              warnings.warn(*args, **kwargs)
            /home/lambda-rl/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:45: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
              warnings.warn(*args, **kwargs)_

And for the pendulum I get KeyError: 'max_reward' -- so I guess i have to check on the parameters for that !

TrentBrick commented 3 years ago

max_reward was not set for pendulum so I have now added that. Guess I didn't let my code run long enough. Also you'll want to check what value this should be set to.

Regarding your error and the lander appearing... The lander should only appear during evaluation of the algorithm unless you have it saving recordings during training.

See this section of the readme.md:

In order to record episodes from epochs, used the flag --recording_epoch_interval. Each epoch of this interval, record_n_rollouts_per_epoch (in config dict, default =1) will be saved out. However, to do this either you need to run a single seed on your local computer or have xvfb installed on your server (see below for in depth instructions on how to re-install GPU drivers that incorporate xvfb). The alternative is to ensure model checkpointing is turned on and render your saved models after training using the --eval_agent flag and providing it with the path to the trained model.

Model checkpointing is on by default and will save the model with the best performance on achieving mean rollout reward.

If you just run the following code what happens?

python trainer.py --implementation UDRL --gamename lunarlander --exp_name debug --num_workers 1 --seed 25

Baldins commented 3 years ago

Same thing! But I guess I might have just to wait a bit longer for the code to run! Usually how long does it take to finish the training?

And thank you so much for your help!!

TrentBrick commented 3 years ago

Same thing with regards to what specifically?

On my MacBook Pro training would take approximately 4 or 5 hours for the lunar lander?

And I'm happy to help! Want to make sure this code base works and is highly accessible!