Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
275 stars 26 forks source link

[Bug] Checkpointing the buffer #188

Closed belerico closed 5 months ago

belerico commented 5 months ago

Hi @Disastorm, I've copied here your new question, so that we keep closed the other issue:

@belerico Hey just wondering how this buffer checkpointing works? I have

buffer:
  size: 1000000
  checkpoint: True

And so when resuming it doesn't do the pretraining buffer steps anymore, however I noticed the buffer files don't ever get updated, the last modified date is just when the first training started. Is this a problem? The files I'm referring to are the .memmap files, I see now it doesn't keep creating them for each run when checkpoint = True, so I assumed it would be using the ones from the previous run, but their update date isn't changing at all. Is it inside the checkpoint file itself? The filesize of the checkpoint still looks pretty similar to when running with checkpoint: False I think.

Disastorm commented 5 months ago

thanks. I'm guessing the buffer is perhaps not the same as the membuffer? seems like its an action history buffer or something like that. However, it does seem to resume strangely when having this enabled, it ends up using the memory buffer from the previous run (although these membuffer files never seems to be updated, so I'm not really sure what its for), and the first resumption seems to work fine, however, upon the second cancellation, the membuffer files from the previous run ( the ones that are being used ) look like they get deleted, and so when trying to resume a second time, there are no membuffer files and it results in an error. Its also a bit confusing as it seems like what you end up having is like

In the case where you have buffer checkpoint False, each run just creates its own new membuffer again and then does the pretraining to fill up the action buffer or whatever it is, and everything works fine.

michele-milesi commented 5 months ago

Hi @Disastorm, I will try to provide some clarity:

  1. The first run contains the memory-mapped buffer (.memmap files in the memmap_buffer folder).
  2. The second run resuming from 1. does not create any memmap file because the buffer instantiated from the checkpoint references the files in the first run memmap_buffer folder.
  3. The third run, load the checkpoint from the second one (i.e, 2.). The buffer stored in the checkpoint (from the second run) recursively references the memmap in the log directory of the first run. The buffer of the second run is saved in the checkpoint, but it references the files of the first run. No other memmap files are created when resuming from a checkpoint (if buffer.checkpoint=True).

This means that the memmap_buffer folder of the first run must NOT be deleted.

I also tried to test it and it works (on Linux), I can restart an experiment multiple times:

  1. First run
  2. Second run resumed from 1.
  3. Third run resumed from 2.
  4. Fourth run resumed from 3.

multiple_resume_from This is the result of the test I carried out. In particular, what I have done is the following:

  1. I used the command python sheeprl.py exp=dreamer_v3_100k_ms_pacman checkpoint.every=100 for the first run.
  2. Then I stopped the process (ctrl+C) when I saw that there was at least one checkpoint.
  3. I resumed the training with the command python sheeprl.py exp=dreamer_v3_100k_ms_pacman checkpoint.every=100 checkpoint.resume_from=/path/to/first/run/checkpoint.ckpt (you must include .ckpt file in the path).
  4. I repeated points 2 and 3 by resuming from a checkpoint of the last run (from the second run, then from the third one).

I understand that the logic of resume_from_checkpoint is convoluted and can be a bit confusing, sorry for that. I hope it is now clearer. We will try to make some changes to make this process clearer.

Disastorm commented 5 months ago

Thanks yea thats basically the same behavior I saw, its just somehow there was something that was triggering the auto-delete of the memmap files from the first run. I don't really know what was triggering it but it seemed to potentially be related to when I was ctrl-c the run, not sure if i messed up a setting somewhere, or if its related to running on windows, etc. Anyway, I'm just using buffer.checkpoint = False for now so you can close this issue, but just wanted to mention that there may be some trigger somewhere that auto-deletes the memmap files from the first run when you do ctrl-c on one of the later runs.

belerico commented 5 months ago

Hey @Disastorm, we have indeed a problem with memmapped arrays on Windows: can you try out this branch pls?

Disastorm commented 5 months ago

Oh I see thanks. I'll try that out for whenever I try to train a new model, for the one I'm currently running I already have buffer.checkpoint = False so I can't try it on this one. or are you saying that actually even when checkpoint = False, the memmap is not working properly and I should use that branch regardless?

btw separate question, is there a way to set exploration in dreamerv3? would i adjust ent_coef, or do i need to use one of those other things like the Plan2Explore configs ( I don't know what Plan2Explore is ).

belerico commented 5 months ago

Oh I see thanks. I'll try that out for whenever I try to train a new model, for the one I'm currently running I already have buffer.checkpoint = False so I can't try it on this one. or are you saying that actually even when checkpoint = False, the memmap is not working properly and I should use that branch regardless?

Nope, the memmap is working properly, the problem arise when you checkpoint the buffer and try to resume multiple times, in that particular case the memmap buffer on Windows will be deleted. If you can try that new branch so we are sure it fixes your problem, then we can close the issue

btw separate question, is there a way to set exploration in dreamerv3? would i adjust ent_coef, or do i need to use one of those other things like the Plan2Explore configs ( I don't know what Plan2Explore is ).

I will open a new issue with the question to keep things in order

Disastorm commented 5 months ago

confirmed this is fixed.