alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
794 stars 108 forks source link

[question] Callback and Debugging? #27

Closed jarlva closed 4 years ago

jarlva commented 4 years ago

What is the best way to add a callback routine? . To collect all worker env custom metrics every N seconds and record to tensorboard . Evaluate custom metrics for early stop (and end training of course)

How to debug (using vscode)? Just set '--train_in_background_thread' to False?

alex-petrenko commented 4 years ago

The parameter --experiment_summaries_interval controls how often summaries are dumped to TB. Take a look at process_report() in appo.py. This is where the episodic metrics are aggregated to be later averaged and written to summary files.

Envs can add 'episode_extra_stats' key that can contain an arbitrary dictionary of metrics, which will be automatically processed and saved to tensorboard as well. This key is only parsed on the episode termination (done=True).

See https://github.com/alex-petrenko/sample-factory/blob/1a762b1ba47969474631d5abcbb84189b3f8e952/envs/quadrotors/wrappers/reward_shaping.py#L54 for example.

If your task requires a completely custom metric aggregation, take a look at https://github.com/alex-petrenko/sample-factory/blob/1a762b1ba47969474631d5abcbb84189b3f8e952/envs/dmlab/dmlab_env.py#L247 Here we define functions that aggregate and process multiple metrics to submit new custom metrics to tensorboard.

There isn't any API for using custom metrics for early stop, but it's very easy to implement. Take a look at https://github.com/alex-petrenko/sample-factory/blob/1a762b1ba47969474631d5abcbb84189b3f8e952/algorithms/appo/appo.py#L625

I am not familiar with VScode in particular, I used mostly PyCharm for debugging. When train_in_background_thread is set to False, I'm able to set breakpoints anywhere in the codebase, without restrictions (even though it a multi-process system). Whether you can do it with VSCode or not depends on how the debugger is attached, but I don't see why it would be impossible in principle.

pdb/ipdb also works, but I suggest setting up your IDE to support breakpoints. With PyCharm it is straightforward.

jarlva commented 4 years ago

Thanks @alex-petrenko

jarlva commented 4 years ago

The debugger's console continuing to spit lines while the debugger halted at the breakpoint. The breakpoint is at: appo.py, line 622.

The command line included: --train_in_background_thread=False --num_workers=1 --num_envs_per_worker=2 <can't enter 1> --policy_workers_per_policy=1

I also get the red warning: train_in_background set to False on learner 0! This is slow, use only for testing!

What am I missing? How to completetly set to a single thread/instance so execution stops at the breakpoint?

alex-petrenko commented 4 years ago

Usually, when you attach a debugger and stop on a breakpoint in a multi-process system, all the other processes will still keep running. You only stop the process where the breakpoint was triggered.

In case of Sample Factory this is usually not a problem. E.g. if you stop a learner the actors will also stop very soon because they will run out of buffers to put their experience to. The only process that will keep running is the master process, the one that outputs the logs and writes to tensorboard. This process does not really do anything except that, so it is not harmful.

I believe there might be a way to stop all processes synchronously, but I don't know how to do it, and I never really needed it in my debugging sessions. Also, this is not specific to Sample Factory, but will be the case with any async system.

On the other points: 1) you could not set --num_envs_per_worker to 1 because double-buffered sampling is enabled by default and for that you need at least two envs. See the help string for this arg and the README for details.

2) The warning is pretty self explanatory. train_in_background=False mode is slower and you should only use it for debugging, not for actual training.

jarlva commented 4 years ago

I noticed that the environments (2 instances at the above setting minimum) continue to run in the background quite a lot of times while the breakpoints is reached. Is there a way to set/limit the buffers or the learners/subprocess to, say, 1 or 2 episodes (so that the aggregated data from the env's can be examined)? It will greatly simplify debugging and further new callback/early-stopping logic development.

Thanks!

alex-petrenko commented 4 years ago

I guess this is a general question about debugging multi-process systems, rather than about this particular system. I can't think of an easy way to halt the execution of multiple processes at once.

This should not prevent you from examining data collected by the envs. You can do that for example here, look at training_data variable: https://github.com/alex-petrenko/sample-factory/blob/1a762b1ba47969474631d5abcbb84189b3f8e952/algorithms/appo/learner.py#L1033

If you set your batch size to a small number (e.g. 1-3 rollouts, so by default 32-96), the workers will stop soon after the breakpoint is hit, because they will run out of buffers to put their data in. I found this helpful during debugging. Shortening the rollout can help in debugging too.

jarlva commented 4 years ago

Thanks again Alex!

jarlva commented 1 year ago

Hi @alex-petrenko , I'm in the process of adopting sf2. All the custom tensorbaord stats are empty. I noticed that sf2 is missing EXTRA_EPISODIC_STATS_PROCESSING and sample_factory\algorithms\utils\algo_utils.py?

alex-petrenko commented 1 year ago

Hi @jarlva

The way to add custom summaries has changed and is now more straightforward (I think). DMLab heavily uses custom summaries, check out this file as an example: https://github.com/alex-petrenko/sample-factory/blob/9da68b57eecd73c3c884c1be2d938b46aa7a7f49/sf_examples/dmlab/train_dmlab.py#L47

Particularly, here we register an extra episodic stat handler and also a new "AlgoObserver" object. I can imagine that for most situations AlgoObserver should actually be enough. See the full interface here: https://github.com/alex-petrenko/sample-factory/blob/9da68b57eecd73c3c884c1be2d938b46aa7a7f49/sample_factory/algo/runners/runner.py#L50

alex-petrenko commented 1 year ago

Regarding the debugging questions earlier in this thread, debugging in SF2 can actually be a lot simpler because of --serial_mode=True.

This flag will run everything in one process which makes stopping the execution trivial.