OpenDriveLab / Openpilot-Deepdive

Our insights of Openpilot, a deepdive project on it
MIT License
231 stars 66 forks source link

training error #21

Closed bobd988 closed 1 year ago

bobd988 commented 1 year ago

Hi,

I was training with comma2k19 with two A6000 GPU cards in a PC with CUDA 11.5, Ubuntu 20.04, with two terminals running each

PORT=23345 SLURM_PROCID=0 SLURM_NTASKS=2 python main.py PORT=23346 SLURM_PROCID=1 SLURM_NTASKS=2 python main.py

I got below error from the first terminal after started. I also tried with one GPU card but it also gave same error. How can I solve this? Thanks.

[1676912307.07] starting job... 0 of 2 [1676912608.11] DDP Initialized at localhost:23345 0 of 2 2023-02-20 09:03:28.404838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 Comma2k19SequenceDataset: DEMO mode is on. Traceback (most recent call last): File "main.py", line 246, in main(rank=int(os.environ['SLURM_PROCID']), world_size=int(os.environ['SLURM_NTASKS']), args=args) File "main.py", line 119, in main train_dataloader, val_dataloader = get_dataloader(rank, world_size, args.batch_size, False, args.n_workers) File "main.py", line 69, in get_dataloader train_sampler = DistributedSampler(train, **dist_sampler_params) TypeError: init() got an unexpected keyword argument 'drop_last'

ElectronicElephant commented 1 year ago

I suspect You are using a very old version of pytorch.

As the error message has said, drop_last is just a parameter of dataloader. If you don't want to upgrade the version of torch, you can simply remove this parameter.

https://pytorch.org/docs/stable/data.html