TonghanWang / NDQ

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)
https://sites.google.com/view/ndq
Apache License 2.0
81 stars 16 forks source link

config file error #5

Closed wwxFromTju closed 4 years ago

wwxFromTju commented 4 years ago

Hey, I try to run the following code: python3 src / main.py --config = categorical_qmix --env-config = sc2 with env_args.map_name = 2s3z and python3 src / main.py --config = tar_qmix --env-config = sc2 with env_args.map_name = 2s3z I found that due to a config error, it cannot be run directly. I modified the runner: "parallel"-> runner: "parallel_x", and modified the representation of the parameters, such as 1e-2-> 0.01

But in the end it still can't run, can you provide complete and runable config file to help others to reproduce?

TonghanWang commented 4 years ago

Hi,

Can you provide more error information?

wwxFromTju commented 4 years ago

Below is the corresponding error message:

[ERROR 10:21:48] pymarl Failed after 0:00:42!
Traceback (most recent calls WITHOUT Sacred internals):
  File "src/main.py", line 35, in my_main
    run(_run, _config, _log)
  File "/home/xxxxxxx/NDQ/src/run.py", line 60, in run
    run_sequential(args=args, logger=logger)
  File "/home/x x x x x x x/NDQ/src/run.py", line 227, in run_sequential
    learner.train(episode_sample, runner.t_env, episode)
  File "/home/xxxxxxx/NDQ/src/learners/categorical_q_learner.py", line 184, in train
    loss.backward()
  File "/home/xxxxxxx/anaconda3/envs/sc_roma/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/xxxxxxx/anaconda3/envs/sc_roma/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 67, 5, 11]], which is output 0 of SliceBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
TonghanWang commented 4 years ago

It seems that this is not a config file error.

TonghanWang commented 4 years ago

What is the version of your pytorch?

wwxFromTju commented 4 years ago

Changed to 0.4, it looks like it can run for a while. Thanks for the quick response.

I still suggest that you modify your current config file, the problems are as I mentioned above

TonghanWang commented 4 years ago

We have updated the README, including the detailed command to reproduce the results in our paper.

Thanks for your comments. Feel free to contact us if you have any other questions.

wwxFromTju commented 4 years ago

Thank you for your very quick reply and great work.