Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Resume support #58

Closed guydav closed 4 years ago

guydav commented 4 years ago

Added preliminary support for resuming. Initial testing looks like it works, but I'd appreciate if anyone else gets a chance to play with it in their setup.

I didn't add an explicit resume flag, although we could do that. Currently, the assumption is that if you provide the --memory-save-path argument, you want the memory saved there, by default after every testing round. If you provide the --model argument and do not provide the --evaluateflag, the assumption is that you want to resume, and that --memory-save-path exists.

Another flag we could add is a --T_start flag, akin to --T_max, in order to specify where training is resuming from to better the logging of resumed models. What do you think?

Choosing to compress at all, and choosing to use bz2 specifically, came after a quick benchmark I did with some pickled memories I had. It drops them from ~2GB to <100 MB, and bz2 took somewhere around 2-3 minutes, while pickling without it took around 40 seconds.

guydav commented 4 years ago

Oh, and I made another minor change that ensures that training ends with a test phase, which is what one might expect to happen but wouldn't happen by default now.

Kaixhin commented 4 years ago

LGTM! Thanks a lot!

guydav commented 4 years ago

Of course!

Typo'd from my iPhone

On Sep 16, 2019, at 12:52, Kai Arulkumaran notifications@github.com wrote:

LGTM! Thanks a lot!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zyzhang1130 commented 4 years ago

@guydav Hi, may I ask is there a way to make use of this resume support to train different policies at different iteration? The idea is to train a policy for some time and then switch to another policy to train while keeping the modified environment due to training the previous policy. I guess doing some modifications in the code is preferred for such a implementation than using flags but the main idea should be very similar.

Thank you for replying.

guydav commented 4 years ago

@zyzhang1130 not trivially, but I'm sure you could modify the code to do it. Sounds like you'd want to keep separate weights and replay buffers for each policy, decide how you switch between them, when you do experience replay for each, etc.

zyzhang1130 commented 4 years ago

just to confirm buffer is saved through this few lines right: (line 171-173 from main. py) If memory path provided, save it if args.memory is not None: save_memory(mem, args.memory, args.disable_bzip_memory)

guydav commented 4 years ago

Yes.

zyzhang1130 commented 4 years ago

noted with thanks.

zyzhang1130 commented 4 years ago

@guydav I try to modify my code with automatic memory saving functionality through pickle filesave/load (like yours), but at the save step it threw me the following error: Traceback (most recent call last):

File "", line 1, in runfile('/home/user/Documents/Zeyu/cups-rl4/main.py', wdir='/home/user/Documents/Zeyu/cups-rl4')

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/user/Documents/Zeyu/cups-rl4/main.py", line 208, in save_memory(mem, 'hidemem.pickle')

File "/home/user/Documents/Zeyu/cups-rl4/main.py", line 122, in save_memory pickle.dump(memory, pickle_file)

PicklingError: Can't pickle <class 'algorithms.rainbow.memory.Transition'>: attribute lookup Transition on algorithms.rainbow.memory failed

my save function is defined as such: def save_memory(memory, memory_path): with open(memory_path, 'wb') as pickle_file: pickle.dump(memory, pickle_file)

and the step got error is:

save_memory(mem, 'hmem.pickle')

Did you encounter similar issue before?

guydav commented 4 years ago

I don't know for sure. I can guess, but your guess is as good as mine. What do you think would happen?

On Wed, Feb 26, 2020 at 12:12 AM zyzhang1130 notifications@github.com wrote:

@guydav https://github.com/guydav May I ask what will happen if I only save the weights but not the replay buffers and what will happen if I use the replay buffers originally for weights A now for weights B?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kaixhin/Rainbow/pull/58?email_source=notifications&email_token=AALFDFGQ7L2FHHO6IC4UPT3REX25BA5CNFSM4IVYFCAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM62MAY#issuecomment-591242755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALFDFBB32AMNIGSURNB53DREX25BANCNFSM4IVYFCAA .

zyzhang1130 commented 4 years ago

I tried to store in .json format, and again got an error says mem is not serializable

guydav commented 4 years ago

This is nowhere near enough information for me to (a) understand what you're trying to do, or (b) be able to actually attempt to help you.

On Wed, Feb 26, 2020 at 11:00 AM zyzhang1130 notifications@github.com wrote:

I tried to store in .json format, and again got an error says mem is not serializable

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kaixhin/Rainbow/pull/58?email_source=notifications&email_token=AALFDFE45HTZZEWZQP6QSDTRE2G3TA5CNFSM4IVYFCAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENAZ2GQ#issuecomment-591502618, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALFDFBAC7ZC46FBPEEZKULRE2G3TANCNFSM4IVYFCAA .

Kaixhin commented 4 years ago

@zyzhang1130 you have raised many issues on this repo within the last month and often asked questions that should be answerable by spending some time understanding and debugging the code yourself, or questions about modifications that are not immediately relevant to this codebase. Please be mindful that myself and others are helping in our free time, so you must be more self-reliant and restrict your enquiries to be more relevant and to-the-point.

zyzhang1130 commented 4 years ago

@Kaixhin ok. @guydav I solved that problem by defining Transition outside of class ReplayMemory (I'm using an older version of Rainbow code).