Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
461 stars 169 forks source link

SIL #158

Open qgallouedec opened 1 year ago

qgallouedec commented 1 year ago

Self Imitation Learning @emrul has implemented SAIL, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/139#issuecomment-1445114579

@emrul, is there an official implementation for those two? Do you match the results from the paper with your implementation?

emrul commented 1 year ago

Hi @qgallouedec - I haven't don't much testing but if there's no rush I'd love to work on this in my spare time. The official implementation appears to be here: https://github.com/google-research/google-research/tree/master/sail_rl

qgallouedec commented 1 year ago

There is no rush at all :)

richardjozsa commented 1 year ago

Hey everyone,

I have tried the code what @emrul pasted in the IQN PR comments, it works.

One thing what I haven't got to work is the SubProcEenv wrapping. Just wanted to let you know. :)

emrul commented 1 year ago

Thanks @richardjozsa - that's interesting because I exclusively use SubProcVecEnv for training and the Dummy vec env for evaluation. What happens when you use SubProcVecEnv?

richardjozsa commented 1 year ago

This is the error what I got, but if it works for you than I recheck. I use a customenv maybe that caused something.

Traceback (most recent call last): RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 274, in main RLTEST | code = _serve_one(child_r, fds, RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 313, in _serve_one RLTEST | code = spawn._main(child_r, parent_sentinel) RLTEST | File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main RLTEST | self = reduction.pickle.load(from_parent) RLTEST | File "/home/ftuser/.local/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 375, in setstate RLTEST | self.var = cloudpickle.loads(var) RLTEST | ModuleNotFoundError: No module named 'base'

emrul commented 1 year ago

... looks like an error trying to load your env from Pickle but in my modifications I don't make any changes to envs (the replay buffer holds the SAIL returns internally) so I don't think this should be caused by amendments.

richardjozsa commented 1 year ago

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

emrul commented 1 year ago

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

Great, and yes - good catch on the device, I will correct that!