bradyz / 2020_CARLA_challenge

"Learning by Cheating" (CoRL 2019) submission for the 2020 CARLA Challenge
181 stars 49 forks source link

"TypeError: zip argument #1 must support iteration" when training map_model from scratch #18

Closed AIasd closed 4 years ago

AIasd commented 4 years ago

Hi, I downloaded data and tried to train the map_model from scratch by running: python3 -m carla_project/src/map_model --dataset_dir /path/to/data. But I encountered the following error:

191 | controller.layers.2                        | ReLU                    | 0     
192 | controller.layers.3                        | BatchNorm1d             | 64    
193 | controller.layers.4                        | Linear                  | 1 K   
194 | controller.layers.5                        | ReLU                    | 0     
195 | controller.layers.6                        | BatchNorm1d             | 64    
196 | controller.layers.7                        | Linear                  | 66    
../LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
../LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
../LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
../LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
../LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
../LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
../LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
6593 frames.
[ 537  484 2226  752  527 1156  911]
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):

  File "carla_project/src/map_model.py", line 236, in <module>
    main(parsed)
  File "carla_project/src/map_model.py", line 207, in main
    trainer.fit(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 759, in fit
    self.dp_train(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 563, in dp_train
    self.run_pretrain_routine(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 899, in run_pretrain_routine
    False)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 278, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 421, in evaluation_forward
    output = model(*args)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 66, in forward
    return self.gather(outputs, self.output_device)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    for k in out))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.pwandb: Waiting for W&B process to finish, PID 15597
y", line 62, in <genexpr>
    for k in out))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Process crashed early, not syncing files

Any help will be appreciated!

bradyz commented 4 years ago

are you using the requirements.txt provided in the carla_project directory?

i havent seen this one before, so it might be some difference in pytorch lightning versions

AIasd commented 4 years ago

Hi @bradyz Thanks for quick response! I searched around and find that this is related to the issue of pytorch lightning here. After I pull their master branch, the error got solved. However, a new error appears:

  | Name       | Type              | Params
-------------------------------------------------
0 | to_heatmap | ToHeatmap         | 0     
1 | net        | SegmentationModel | 39 M  
2 | controller | RawController     | 1 K   
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
6593 frames.
[ 537  484 2226  752  527 1156  911]
6593 frames.
[ 537  484 2226  752  527 1156  911]
Validation sanity check: 0it [00:00, ?it/s]
wandb: Waiting for W&B process to finish, PID 929
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
Traceback (most recent call last):
  File "carla_project/src/map_model.py", line 236, in <module>
    main(parsed)
  File "carla_project/src/map_model.py", line 207, in main
    trainer.fit(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in fit
    self.accelerator_backend.train(model, nprocs=self.num_processes)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 43, in train
    mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 157, in ddp_train
    results = self.trainer.run_pretrain_routine(model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1194, in run_pretrain_routine
    self._run_sanity_check(ref_model, model)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1227, in _run_sanity_check
    eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 305, in _evaluate
    for batch_idx, batch in enumerate(dataloader):
  File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 29, in __iter__
    yield next(self.data)
  File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 8, in _repeater
    for data in loader:
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset.<locals>.<lambda>' 

These problems are all related to the usage of multi-GPUs. I guess the code is not meant to support pytorch multi-GPUs training, right? When I try only using one GPU, the code can run successfully.

bradyz commented 4 years ago

correct - i only used this code for single gpu training

AIasd commented 4 years ago

Thank you for clarifying!

aleallievi commented 3 years ago

@AIasd did you ever find a solution for this? My solution was to set the Trainer(accelerator='horovod) or Trainer(accelerator='ddp_spawn'); however it would be great if we could use ddp. Thanks!

AIasd commented 3 years ago

@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.

aleallievi commented 3 years ago

@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.

Ok - thanks for your feedback

raozhongyu commented 3 years ago

Thanks for your work , I meet the same question about"TypeError: zip argument #1 must support iteration". Could you tall me how to solve it. Thanks a lot

pratikchhapolika commented 1 year ago

@AIasd how did you solve this?