Closed amrzv closed 1 year ago
looks like CUDA_VISIBLE_DEVICES
contains something like 0,\n
and we don't expect this.
Pushed a small fix, see if this helps :)
https://github.com/alex-petrenko/sample-factory/commit/7509cc4b37cbf5b7b541ea0dda49cbad1152c8fe
I forgot to mention, that this error was raised when running in colab without GPU. Now, when running in colab without GPU the ValueError
above is gone, but next another raises:
Seems that the device should be manually set before in cfg
:
cfg.device = 'gpu' if torch.cuda.is_available() else 'cpu'
However, when I add this line to cfg
, restart notebook and run again I still observe device=='gpu' in logging information:
I'm not sure if everything is supposed to work on CPU.
When running in colab with GPU whole notebook works without errors.
From the screenshot, it looks like the notebook is loading an old experiment where the cfg still has the device set to gpu. Could you try changing the experiment_name
and running again with cfg.device=cpu
? Alternatively, you can add the command line --restart_behavior=overwrite
to the argv to wipe out the old experiment
I cleared the local dir, restarted the kernel and run again. Now notebook works without errors:
So, probably this line should be added as on the screenshot:
cfg.device = 'gpu' if torch.cuda.is_available() else 'cpu'
@andrewzhang505 thank you for fixing this!
@amrzv thank you for reporting the issue! Are you generally able to run serious multi-process configurations of Sample Factory in Colab Notebooks? I have no experience with this myself, and I know that Jupiter notebooks had some issues with multiprocessing preventing people from unlocking the full speed and power of the codebase.
I'd be happy to know if Colab solves this issue.
Also, certain types of environments work really well with a single-process synchronous configuration (such as Brax and IsaacGym). I can imagine interactive notebooks are great for these environments.
I would say that Colab environment has the same limitations as Jupyter. So, what about multiprocessing, the behaviour and issues which you mentioned are the same in Colab.
Hi. When running
samplefactory_hub_example.ipynb
notebook in colab theValueError
raises:Seems that there is a bug in
sample_factory.utils.gpu_utils.py
.