Closed 0uroboro5 closed 1 month ago
Thanks for the interest!
Unfortunately, I have never run into this error before so don't have much clue either. I'd suggest try reducing the batch size to see if it helps.
For policy training, I don't recall exactly but it's possible that reward is low even after long training since the task reward is very sparse. Since the default config does not use our MIMEx exploration module, I'd suggest try sweeping a few exploration config (by changing this setting to this).
Hope this helps!
I am trying this very good work of yours using wsl2 based Ubuntu 22.04. The training task is
cartpole_swingup_sparse
.The problem I am having is when my training runs here:
The program is interrupted by the system for this reason:
The guess I got after checking the web about the problem is a resource limitation of the hardware device. But exactly which resource shortage leaves me clueless. Is it possible to extract more valid information from the error message?
Another question is that R is still 0 at
E: 172
, is this normal? And I'm trying to find thenum_train_frames
setting in the config file, are you using the default value?