Closed AIasd closed 4 years ago
are you using the requirements.txt provided in the carla_project directory?
i havent seen this one before, so it might be some difference in pytorch lightning versions
Hi @bradyz Thanks for quick response! I searched around and find that this is related to the issue of pytorch lightning here. After I pull their master branch, the error got solved. However, a new error appears:
| Name | Type | Params
-------------------------------------------------
0 | to_heatmap | ToHeatmap | 0
1 | net | SegmentationModel | 39 M
2 | controller | RawController | 1 K
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_09_04_07_23_07_09
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_19_04_08_16_31_51
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_29_04_09_11_47_17
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_39_04_06_09_50_43
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_49_04_06_11_43_48
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_59_04_06_13_26_15
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
/home/zhongzzy9/Documents/self-driving-car/LBC_data/CARLA_challenge_autopilot/route_69_04_09_00_28_07
6593 frames.
[ 537 484 2226 752 527 1156 911]
6593 frames.
[ 537 484 2226 752 527 1156 911]
Validation sanity check: 0it [00:00, ?it/s]
wandb: Waiting for W&B process to finish, PID 929
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
Traceback (most recent call last):
File "carla_project/src/map_model.py", line 236, in <module>
main(parsed)
File "carla_project/src/map_model.py", line 207, in main
trainer.fit(model)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in fit
self.accelerator_backend.train(model, nprocs=self.num_processes)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 43, in train
mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/accelerator_backends/ddp_spawn_backend.py", line 157, in ddp_train
results = self.trainer.run_pretrain_routine(model)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1194, in run_pretrain_routine
self._run_sanity_check(ref_model, model)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1227, in _run_sanity_check
eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 305, in _evaluate
for batch_idx, batch in enumerate(dataloader):
File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 29, in __iter__
yield next(self.data)
File "/home/zhongzzy9/Documents/self-driving-car/2020_CARLA_challenge/carla_project/src/dataset_wrapper.py", line 8, in _repeater
for data in loader:
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
w.start()
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/zhongzzy9/anaconda3/envs/carla99/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataset.<locals>.<lambda>'
These problems are all related to the usage of multi-GPUs. I guess the code is not meant to support pytorch multi-GPUs training, right? When I try only using one GPU, the code can run successfully.
correct - i only used this code for single gpu training
Thank you for clarifying!
@AIasd did you ever find a solution for this? My solution was to set the Trainer(accelerator='horovod) or Trainer(accelerator='ddp_spawn'); however it would be great if we could use ddp. Thanks!
@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.
@aleallievi , in my case, not using ddp is fine so I did not explore this issue further.
Ok - thanks for your feedback
Thanks for your work , I meet the same question about"TypeError: zip argument #1 must support iteration". Could you tall me how to solve it. Thanks a lot
@AIasd how did you solve this?
Hi, I downloaded data and tried to train the map_model from scratch by running:
python3 -m carla_project/src/map_model --dataset_dir /path/to/data
. But I encountered the following error:Any help will be appreciated!