About the 2D backbone - Githubissues

SxJyJay commented 2 years ago

Hi, I have some questions about training the TransFusion-LC.

You mentioned in the supplementary materials that a 2D backbone pre-trained on the autonomous driving datasets is required and frozen during training the TransFusion-LC. (i.e., DLA-34 and ResNet-50 pre-trained on the nuScenes and Waymo in repsectively.) However, I don't find relevant pre-trained models in the readme.md of this git, and relevant configuration terms in the config files (e.g., transfusion_nusc_voxel_LC.py). Or maybe you have provided but I missed something important?
Could you please provide relevant pre-trained 2D backbone models, or relevant instructions of pre-training the 2D backbone models? Thanks a lot!

XuyangBai commented 2 years ago

Hi, sorry it seems I didn't make it clear in the readme.

For DLA34 pretrained on 3D detection, I follow PointAugmenting to reuse the model provided by CenterNet. You can download the checkpoint from https://github.com/xingyizhou/CenterTrack/blob/master/readme/MODEL_ZOO.md#monocular-3d-detection-tracking.
For ResNet50+FPN pretrained on instance segmentation, I use the model provided by mmdet3d, you can download the checkpoints from https://github.com/open-mmlab/mmdetection3d/blob/v0.12.0/configs/nuimages/README.md (note that you should also use the checkpoints provided by mmdet3d v0.12.0). I choose the backbone from MaskRCNN that pretrained only on imagenet (the first one).
For ResNet50+FPN pretrained on 2D detection, I train the model by using the same config file with (2) except removing the mask head.

And I use a similar step as (3) to train a 2D backbone for the waymo dataset. I can send you the relevant processing code and config file if needed.

Best, Xuyang.

SxJyJay commented 2 years ago

Thanks a lot for your reply! It is really clear! Could you please send me the relevant code for training the 2D backbone for the waymo dataset if that doesn't bother you? My email is yanjay2future@gmail.com.

XuyangBai commented 2 years ago

Hi, I have sent them to your email.

SxJyJay commented 2 years ago

Thanks! I received your email. I still have some questions about re-implementation.

In https://github.com/XuyangBai/TransFusion/blob/ab245436a096e3e4a2fd10d773d75bf1d78fb611/configs/transfusion_nusc_voxel_LC.py#L130, I notice that you comment out DLA-34 image backbone, and replace it with the ResNet-50. I am wondering whether the configuration parameters of DLA-34 is the full version of DLA-34 because I notice that "heads" parameter is set to empty. img_backbone=dict( type='DLASeg', num_layers=34, heads={}, head_convs=-1, ),
I am reproducing the TransFusion-L strictly following your config file and instructions, but the mAP on nuScenes validation set is only 0.5985 at 17-th epoch (the whole training process hasn't be finished yet). I don't know where I went wrong. Could you please send me the training logs for TransFusion-L and TransFusion-LC, and thereby I can compare it with my training log.

Sorry to bother you again. Sincere appreciation!

XuyangBai commented 2 years ago

Hi,

R50 + FPN gives a slightly better result compared with DLA34 (as shown in Table 12 in the supplementary material). And I only use DLA34 as the image feature extractor, so I do not load the task head.
Did you adopt the fade strategy (disenable the copy-and-paste augmentation for the last 5 epochs)? That can have a remarkable effect on the mAP by reducing the false positive.

Best, Xuyang

SxJyJay commented 2 years ago

Oh, I got it. I forget to adopt the fade strategy for the last 5 epochs. Besides, I found that the NDS value is always lower than mAP in my present validation process. e.g.,

mAP 0.5199; NDS 0.4856 at epoch 5;
mAP 0.5606; NDS 0.5244 at epoch 10;
mAP 0.5895; NDS 0.5453 at epoch 15 I don't know if this is a normal phenomenon. I observe that the NDS value is generally higher than the mAP values. Could you please provide some valuable suggestions or point out where am I possibly wrong? Thanks!

Sincerely, Jay

XuyangBai commented 2 years ago

It is not normal, could you provide the full results such as mATE, mAOE, mASE?

XuyangBai commented 2 years ago

It can have a very bad mAOE abd mASE if you use the newest version mmdet3d to generate the .pkl and then train TransFusion. mmdet3d has a large coordinate system refactoring in newer version. See https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md#coordinate-system-refactoring

SxJyJay commented 2 years ago

OK, I list the TP metrics results as below: at epoch 19 (without the fade strategy), mATE=0.2839; mASE=0.7090, mAOE=1.5609; mAVE=0.2707; mAAE=0.1913

It can have a very bad mAOE and mASE if you use the newest version mmdet3d to generate the .pkl and then train TransFusion. mmdet3d has a large coordinate system refactoring in the newer version. See https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md#coordinate-system-refactoring

I think this might be the key to my problem! Since I create the meta data of nuscenes with the newest released mmdet3d, and degrade its version after I find the version mismatching with the mmdet3d of the TransFusion github. Thanks for your valuable advice! I will re-create the metadata and see what will happen.

YunzeMan commented 2 years ago

Nice discussion above! Hi @XuyangBai, I have a follow-up question regarding the training of LC model.

To load the TransFusion-L model when training the -LC model, should we change the load_from key in the config file into the -L model checkpoint, or should we leave that part empty but change the pretrained key in the TransFusionDetector field instead?

XuyangBai commented 2 years ago

Hi @YunzeMan I usually use the following code to combine the pretrained TransFusion-L and the 2D backbone

img = torch.load('img_backbone.pth', map_location='cpu')
pts = torch.load('transfusionL.pth', map_location='cpu')
new_model = {"state_dict": pts["state_dict"]}
for k,v in img["state_dict"].items():
    if 'backbone' in k or 'neck' in k:
        new_model["state_dict"]['img_'+k] = v
        print('img_'+k)
torch.save(new_model, "fusion_model.pth")

And then set the load_from key to load both the pretrained 3D backbone and 2D backbone.

WWW2323 commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

XuyangBai commented 2 years ago

@WWW2323 about 2 days for me using 8V100 GPUs

SxJyJay commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

SxJyJay commented 2 years ago

@XuyangBai Hi, I have finished the whole training process of TransFusion. I make no modifications except for replacing the DLA-34 to ResNet50+FPN as you suggested. And the final results on the nuscenes validation set are: mAP=67.25, NDS=70.89, mATE=28.09, mASE=25.30, mAOE=28.58, mAVE=26.26, mAAE=19.15 The mAP and NDS are a little lower than the results on the nuscenes test set reported in the paper. Conventionally, I think the results on the test set are lower than those on the validation set.

Besides, I find that the mAP drop may be caused by much lower AP of some classes such as trailer, traffic cone and barrier.. I list AP of my results (on val set) vs reported results (on test set) below: car(87.9 vs 87.1), truck(64.0 vs 60.0), bus(74.1 vs 68.3), trailer(43.5 vs 60.8), construction_vehicle(29.8 vs 33.1), pedestrian(88.3 vs 88.4), motorcycle(74.3 vs 73.6), bike(63.5 vs 52.9), traffic cone(77.1 vs 86.7), barrier(70.1 vs 78.1)

I don't know whether my results are within an acceptable error margin. Or such results are caused by the bias of different image backbones (i.e., DLA-34 and ResNet50+FPN)?

XuyangBai commented 2 years ago

Hi @SxJyJay, You can see the detailed results on val set below.

mAP: 0.6727
mATE: 0.2721
mASE: 0.2517
mAOE: 0.2740
mAVE: 0.2536
mAAE: 0.1902
NDS: 0.7122

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.876   0.169   0.148   0.085   0.259   0.185
truck   0.620   0.302   0.182   0.102   0.228   0.221
bus     0.757   0.302   0.186   0.048   0.386   0.256
trailer 0.428   0.520   0.209   0.463   0.185   0.163
construction_vehicle    0.274   0.666   0.417   0.833   0.124   0.318
pedestrian      0.878   0.128   0.282   0.360   0.215   0.097
motorcycle      0.754   0.184   0.244   0.215   0.421   0.267
bicycle 0.631   0.150   0.263   0.300   0.212   0.016
traffic_cone    0.770   0.119   0.304   nan     nan     nan
barrier 0.739   0.182   0.281   0.059   nan     nan

I think it is within an acceptable error margin. The slightly worse performance might be coming from the training variance. For the gap between validation and test set, it is normal because generally they are having different distributions. Also, you could try using more queries during inference to get a better result with a longer inference time (see Table 13 in the supplementary) Besides, if you are using a different version mmdet3d, some data augmentation strategy is actually disabled ( see the difference between LoadMultiViewImage in this codebase and in mmdet3d) if no img_fields is set, the RandomFlip augmentation is actually not working.

304886938 commented 2 years ago

Hello @XuyangBai, I want to use your results on nuscenes validation set to do object tracking experiment, but I don't have enough computing power for training. I wonder if you could provide json files of the validation set results? Here is my email 304886938@qq.com. Looking forward to your reply!

SxJyJay commented 2 years ago

Thank you. On the validation set, the performance I re-produced seems close to yours. I also list my re-produced results on the val set below:

mAP: 0.6725
mATE: 0.2809
mASE: 0.2530
mAOE: 0.2858
mAVE: 0.2626
mAAE: 0.1915
NDS: 0.7089
Eval time: 110.1s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.879   0.168   0.148   0.087   0.259   0.196
truck   0.640   0.322   0.182   0.085   0.232   0.223
bus     0.741   0.326   0.181   0.041   0.407   0.244
trailer 0.435   0.509   0.203   0.495   0.213   0.159
construction_vehicle    0.298   0.723   0.445   0.817   0.123   0.324
pedestrian      0.883   0.128   0.285   0.376   0.217   0.093
motorcycle      0.743   0.183   0.232   0.216   0.451   0.281
bicycle 0.635   0.146   0.255   0.404   0.198   0.013
traffic_cone    0.771   0.118   0.311   nan     nan     nan
barrier 0.701   0.187   0.288   0.050   nan     nan

My problems are perfectly solved by you! Hence, I close this issue. Thanks again for your patience!

xxlbigbrother commented 2 years ago

Hi, I have sent them to your email.

Hi, I also plan to train 2D backbone for waymo and nuscenes, Could you please send me the relevant code for training the 2D backbone? It will be helpful! My email is xxlbigbrother@gmail.com

zzm-hl commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

zzm-hl commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

SxJyJay commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

Hi, my runtime environment is shown below:

 - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for
Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff68
3)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arc
h=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=com
pute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_
75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,co
de=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUD
NN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLA
GS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -D
NDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XN
NPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing
-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -
Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-funct
ion -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-s
trict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno
-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-ca
st -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno
-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-s
tringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WI
TH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PT
R=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, US
E_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.5.5
MMCV: 1.3.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.10.0
MMDetection3D: 0.11.0+

Besides, I think you can check the time consumed on fetching data and running one forward pass to identify where is the bottleneck. Maybe your problem is caused by slow io.

zzm-hl commented 2 years ago

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

Hi, my runtime environment is shown below:
 - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for
Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff68
3)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arc
h=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=com
pute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_
75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,co
de=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUD
NN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLA
GS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -D
NDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XN
NPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing
-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -
Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-funct
ion -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-s
trict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno
-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-ca
st -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno
-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-s
tringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WI
TH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PT
R=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, US
E_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.5.5
MMCV: 1.3.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.10.0
MMDetection3D: 0.11.0+
Besides, I think you can check the time consumed on fetching data and running one forward pass to identify where is the bottleneck. Maybe your problem is caused by slow io.

thanks for your reply! The strange thing is that my GPU usage has been maintained at 100, basically will not jump back and forth, I don't know if this can mean that the Speed of CPU loading data is normal？

wzmsltw commented 2 years ago

@SxJyJay Hi, can you provide the trained TransFusion and TransFusion-L model? My re-produced result is 63.9 mAP (Lidar) and 64.4 mAP(Lidar+Camera), which is strange. Thanks so much!

SxJyJay commented 2 years ago

@wzmsltw Hi, you can leave me your email, and I will send checkpoints to you.

wzmsltw commented 2 years ago

@SxJyJay my email address is wzmsltw@gmail.com Thanks so much for your help!

wzmsltw commented 2 years ago

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

SxJyJay commented 2 years ago

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

Sorry for the delay. I have something urgent yesterday. I have send you! Best, Yang Jiao

maokp commented 2 years ago

@SxJyJay Could you provide the trained TransFusion and TransFusion-L model? I only have one GTX 3080 and it is very difficult to train such a network. My email is maokaip@gmail.com. Thanks so much

ChengFengW commented 2 years ago

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

Sorry for the delay. I have something urgent yesterday. I have send you! Best, Yang Jiao

hi，Could you shared the model of Transfunsion L and Transfusion LC based on nuscenes? I have only a GPU device of 3080, so training may take a long time. Thank you very much! chengzhi0323@gmail.com

kuangpanda commented 2 years ago

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

Sorry for the delay. I have something urgent yesterday. I have send you! Best, Yang Jiao

hi，Could you shared the model of Transfunsion L and Transfusion LC based on nuscenes? I have only a GPU device of 3080, so training may take a long time. Thank you very much! chengzhi0323@gmail.com

@SxJyJay Hi, I also plan to train 2D backbone for waymo and nuscenes, Could you please send me the relevant data processing code for training the 2D backbone? It will be helpful! My email is kuangpanda@gmail.com

heminghuang7 commented 2 years ago

Hi, sorry it seems I didn't make it clear in the readme.

For DLA34 pretrained on 3D detection, I follow PointAugmenting to reuse the model provided by CenterNet. You can download the checkpoint from https://github.com/xingyizhou/CenterTrack/blob/master/readme/MODEL_ZOO.md#monocular-3d-detection-tracking.

For ResNet50+FPN pretrained on instance segmentation, I use the model provided by mmdet3d, you can download the checkpoints from https://github.com/open-mmlab/mmdetection3d/blob/v0.12.0/configs/nuimages/README.md (note that you should also use the checkpoints provided by mmdet3d v0.12.0). I choose the backbone from MaskRCNN that pretrained only on imagenet (the first one).

For ResNet50+FPN pretrained on 2D detection, I train the model by using the same config file with (2) except removing the mask head.

And I use a similar step as (3) to train a 2D backbone for the waymo dataset. I can send you the relevant processing code and config file if needed.

Best, Xuyang.

Hello, thank you for the work. I want to reproduce the TransFusion LC, can you explain more about (3), what the "mask head" in the config file represents? In other words, where can I find it and how can I remove it?

XuyangBai commented 2 years ago

@heminghuang7 You can comment the following part out: https://github.com/XuyangBai/TransFusion/blob/399bda09a3b6449313ccc302df40651f77ec78bf/configs/_base_/models/mask_rcnn_r50_fpn.py#L56-L66

heminghuang7 commented 2 years ago

@heminghuang7 You can comment the following part out:

https://github.com/XuyangBai/TransFusion/blob/399bda09a3b6449313ccc302df40651f77ec78bf/configs/_base_/models/mask_rcnn_r50_fpn.py#L56-L66

Thank you so much!

SxJyJay commented 2 years ago

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

JamesHao-ml commented 2 years ago

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

@SxJyJay Could you please also send a copy of the model of Transfunsion L and Transfusion LC on nuscenes? I am struggling to reproduce the performances. It couldn't be better if you could also share the training logs. Thank you very much! joshua01cv@gmail.com

yangsijing1995 commented 2 years ago

@SxJyJay hello,can you send me the checkpoints and training logs for TransFusion-L and TransFusion?Thank you so much! my email:yangsijing1117@163.com

wangyd-0312 commented 2 years ago

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

hello, could you also sent me the checkpoints and training logs for TransFusion_L and TransFusion? Thank you very much!!!! My email is awyd1183@163.com

Young98CN commented 2 years ago

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

@SxJyJay hello,I plan to reproduce results on nuscenes, Could you please send me the checkpoint for training the 2D backbone? It will be helpful! My email is 834273418@qq.com

zzj403 commented 2 years ago

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

@SxJyJay Hi, I am a PhD studenet aiming to study the lidar-camera detection models. I've tried many times but I still cannot reproduce satisfing results. Could you please send me your checkpoints? Really looking forward to it. Thanks! my email is 945937825@qq.com

xpyqiubai commented 2 years ago

Hi, I have sent them to your email.

Hi, I also plan to train 2D backbone for waymo and nuscenes. Could you please send me the relevant code for training the 2D backbone for the waymo and nuscenes dataset if that doesn't bother you? (specifically the waymo dataset) My email is xpydgqb@gmail.com

HatakeKiki commented 2 years ago

Hi， I'm also trying to reproduce TransFusion-L but my mAP and NDS (60.34 & 66.46) are much lower than the author's. Could you please send me your training log of TransFusion-L? I notice an obvious drop of loss at epoch 16 when fade strategy is applied in other's training. But mine seems no difference between with and w/o fade strategy. Thank you! My mail is: kiki_jiang@sjtu.edu.cn

SxJyJay commented 2 years ago

@JamesHao-ml @yangsijing1995 @wangyd-0312 @Young98CN @zzj403 @jqfromsjtu Hi, I have sent checkpoints to u. Sorry for late reply, as I just finish a DDL.

SxJyJay commented 2 years ago

@xpyqiubai @xxlbigbrother @kuangpanda Hi, I have sent data processing code for waymo and kitti to u. Sorry for late reply.

xpyqiubai commented 2 years ago

@xpyqiubai @xxlbigbrother @kuangpanda Hi, I have sent data processing code for waymo and kitti to u. Sorry for late reply.

Thanks!

yichen928 commented 2 years ago

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

SxJyJay commented 2 years ago

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

I have sent relevant checkpoints and data processing code to your email.

yichen928 commented 2 years ago

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

I have sent relevant checkpoints and data processing code to your email.

Thank you very much!

minrui-hust commented 2 years ago

Hi, @SxJyJay, I have reproduce the Transfusion-L with mAP 65.4, however, the reproduced Transfusion-LC model can only achive mAP 65.6, which has a large gap between yours(67.25). Can you send me your training log and checkpoint of both Transfusion-L and Transfusion-LC so I can check where went wrong, my email is hustminrui@126.com. Thank you!

SxJyJay commented 2 years ago

Hi, @SxJyJay, I have reproduce the Transfusion-L with mAP 65.4, however, the reproduced Transfusion-LC model can only achive mAP 65.6, which has a large gap between yours(67.25). Can you send me your training log and checkpoint of both Transfusion-L and Transfusion-LC so I can check where went wrong, my email is hustminrui@126.com. Thank you!

Hi, I have sent you relevant pretrained weights.

XuyangBai / TransFusion

About the 2D backbone #7