Sense-GVT / Fast-BEV

Fast-BEV: A Fast and Strong Bird’s-Eye View Perception Baseline
Other
596 stars 91 forks source link

weird training error #18

Closed yukaizhou closed 1 year ago

yukaizhou commented 1 year ago

By adding the dist_train.sh file under the tools folder, the single-machine multi-gpus training is performed, but the following error is reported. File "/dfs/data/code_python/detection_3d/FastBEV/mmdet3d/models/opt/adamw.py", line 111, in step F.adamw(params_with_grad, TypeError: adamw() takes 6 positional arguments but 12 were given

ymlab commented 1 year ago

It should be a version problem of pytorch. I have two suggested solutions, 1) adapt the function calls here(https://github.com/Sense-GVT/Fast-BEV/blob/dev/mmdet3d/models/opt/adamw.py#L111) to the version of pytorch you are using, or 2) switch to the version of pytorch==1.8.1+cuda90.cudnn7.6.5 which works fine.

ymlab commented 1 year ago

By adding the dist_train.sh file under the tools folder, the single-machine multi-gpus training is performed, but the following error is reported. File "/dfs/data/code_python/detection_3d/FastBEV/mmdet3d/models/opt/adamw.py", line 111, in step F.adamw(params_with_grad, TypeError: adamw() takes 6 positional arguments but 12 were given

In addition, it would be very grateful if you can share the training script based on dist_train in the future.

guoqi-code commented 1 year ago

!/bin/bash

CONFIG=configs/fastbev/exp/paper/fastbev_m0_r18_s256x704_v200x200x4_c192_d2_f4.py WORK_DIR=work_dirs/m0_r18_s256x704_v200x200x4_c192_d2_f4_second_resume_epoch7 PORT=${PORT:-29500}

GPUS=4 PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \ $(dirname "$0")/train.py $CONFIG --work-dir $WORK_DIR --launcher pytorch ${@:3}

I use above script to train with dist_train mode

guoqi-code commented 1 year ago

By adding the dist_train.sh file under the tools folder, the single-machine multi-gpus training is performed, but the following error is reported. File "/dfs/data/code_python/detection_3d/FastBEV/mmdet3d/models/opt/adamw.py", line 111, in step F.adamw(params_with_grad, TypeError: adamw() takes 6 positional arguments but 12 were given

This file "mmdet3d/models/opt/adamw.py" at line 111, you can modify the code: F.adamw(params_with_grad, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad=amsgrad, beta1=beta1, beta2=beta2, lr=group['lr'], weight_decay=group['weight_decay'], eps=group['eps'])

ymlab commented 1 year ago

Thanks for sharing.