Closed wwxFromTju closed 4 years ago
Hi,
Can you provide more error information?
Below is the corresponding error message:
[ERROR 10:21:48] pymarl Failed after 0:00:42!
Traceback (most recent calls WITHOUT Sacred internals):
File "src/main.py", line 35, in my_main
run(_run, _config, _log)
File "/home/xxxxxxx/NDQ/src/run.py", line 60, in run
run_sequential(args=args, logger=logger)
File "/home/x x x x x x x/NDQ/src/run.py", line 227, in run_sequential
learner.train(episode_sample, runner.t_env, episode)
File "/home/xxxxxxx/NDQ/src/learners/categorical_q_learner.py", line 184, in train
loss.backward()
File "/home/xxxxxxx/anaconda3/envs/sc_roma/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/xxxxxxx/anaconda3/envs/sc_roma/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 67, 5, 11]], which is output 0 of SliceBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
It seems that this is not a config file error.
What is the version of your pytorch?
Changed to 0.4, it looks like it can run for a while. Thanks for the quick response.
I still suggest that you modify your current config file, the problems are as I mentioned above
We have updated the README, including the detailed command to reproduce the results in our paper.
Thanks for your comments. Feel free to contact us if you have any other questions.
Thank you for your very quick reply and great work.
Hey, I try to run the following code:
python3 src / main.py --config = categorical_qmix --env-config = sc2 with env_args.map_name = 2s3z
andpython3 src / main.py --config = tar_qmix --env-config = sc2 with env_args.map_name = 2s3z
I found that due to a config error, it cannot be run directly. I modified therunner: "parallel"
->runner: "parallel_x"
, and modified the representation of the parameters, such as1e-2-> 0.01
But in the end it still can't run, can you provide complete and runable config file to help others to reproduce?