The +experiment=simple_dqn/cartpole crashes with the following error on Ubuntu:
[2022-08-29 12:55:38,295][cogment_verse.app][INFO] - Run starting...
[2022-08-29 12:55:38,328][cogment_verse.processes.run][INFO] - Starting run [pedantic_hofstadter_0] from [actors.simple_dqn.SimpleDQNTraining]
[2022-08-29 12:55:38,727][cogment_verse.mlflow_experiment_tracker][INFO] - Experiment with name '/actors.simple_dqn.SimpleDQNTraining' not found. Creating it.
[2022-08-29 12:55:41,056][cogment_verse.processes.run][ERROR] - Error while executing run [pedantic_hofstadter_0] from [actors.simple_dqn.SimpleDQNTraining]
Traceback (most recent call last):
File "/home/vahid/Desktop/cogment-verse/dev_dagger/cogment_verse/processes/run.py", line 64, in run_main_async
await run.impl(run_session)
File "/home/vahid/Desktop/cogment-verse/dev_dagger/actors/simple_dqn.py", line 367, in impl
run_session.log_metrics(trial_idx, total_reward=total_reward)
File "/home/vahid/Desktop/cogment-verse/dev_dagger/cogment_verse/run/run_session.py", line 84, in log_metrics
self._xp_tracker.log_metrics(step_timestamp=int(time.time() 1000), step_idx=self._step_idx, args, **kwargs)
TypeError: log_metrics() got multiple values for argument 'step_timestamp'
I have included the changes from pytorch_multiproc_fix branch.
I also see inconsistent behaviour with the +experiment=simple_a2c/cartpole. It sometimes runs fine, and sometimes stalls after running a bunch of epochs with no error.
The +experiment=simple_dqn/cartpole crashes with the following error on Ubuntu:
[2022-08-29 12:55:38,295][cogment_verse.app][INFO] - Run starting... [2022-08-29 12:55:38,328][cogment_verse.processes.run][INFO] - Starting run [pedantic_hofstadter_0] from [actors.simple_dqn.SimpleDQNTraining] [2022-08-29 12:55:38,727][cogment_verse.mlflow_experiment_tracker][INFO] - Experiment with name '/actors.simple_dqn.SimpleDQNTraining' not found. Creating it. [2022-08-29 12:55:41,056][cogment_verse.processes.run][ERROR] - Error while executing run [pedantic_hofstadter_0] from [actors.simple_dqn.SimpleDQNTraining] Traceback (most recent call last): File "/home/vahid/Desktop/cogment-verse/dev_dagger/cogment_verse/processes/run.py", line 64, in run_main_async await run.impl(run_session) File "/home/vahid/Desktop/cogment-verse/dev_dagger/actors/simple_dqn.py", line 367, in impl run_session.log_metrics(trial_idx, total_reward=total_reward) File "/home/vahid/Desktop/cogment-verse/dev_dagger/cogment_verse/run/run_session.py", line 84, in log_metrics self._xp_tracker.log_metrics(step_timestamp=int(time.time() 1000), step_idx=self._step_idx, args, **kwargs) TypeError: log_metrics() got multiple values for argument 'step_timestamp'
I have included the changes from pytorch_multiproc_fix branch.
I also see inconsistent behaviour with the +experiment=simple_a2c/cartpole. It sometimes runs fine, and sometimes stalls after running a bunch of epochs with no error.