Closed Nntraveler closed 3 years ago
Out team tried to train about 8 models in parallel. The command is:
python3 train.py --episodes 600 --metric_period 200 --steps 360 --thread 60
After about 30 episodes, an error occured in part of these processes:
----------------------------------------------------35/600 start_time_epoch = 0 max_time_epoch = 3600 report_log_rate = 10 warning_stop_time_log = 100 Traceback (most recent call last): File "train.py", line 968, in <module> scores_dir, File "train.py", line 512, in train observations, rewards, dones, infos = env.step(actions_) File "/usr/local/lib/python3.7/dist-packages/CBEngine/envs/CBEngine.py", line 168, in step self.eng.next_step() RuntimeError: Resource temporarily unavailable During handling of the above exception, another exception occurred: Traceback (most recent call last): File "train.py", line 981, in <module> result["success"] = True AssertionError
Before that, using --thread 360 produced the error. After switching to --thread 60, it still happens. Thanks a lot.
Out team tried to train about 8 models in parallel. The command is:
After about 30 episodes, an error occured in part of these processes:
Before that, using --thread 360 produced the error. After switching to --thread 60, it still happens. Thanks a lot.