duanyiqun / DiffusionDepth

PyTorch Implementation of introducing diffusion approach to 3D depth perception ECCV 2024
https://arxiv.org/abs/2303.05021
Apache License 2.0
306 stars 17 forks source link

Question about using ResNet18/50/101 backbone usage #30

Closed unlugi closed 1 year ago

unlugi commented 1 year ago

Dear author thank you for releasing the code and for the excellent work.

For Swin, you advise to use the pretrained weights - I saw this in the code (pretrained='/mnt/cfs/algorithm/new/s3depth/pretrain/swin_large_patch4_window7_224_22k.pth').

I want to use ResNet backbone for feature extraction. I want to use pretrained Resnet. Could you please show how to easily do this in the code? Or did you train ResNet from scratch for the ResNet baseline?

Thank you in advance.

duanyiqun commented 1 year ago

Thank you very much for your question and interests. Yes, if you want use ResNet as feature extractor please see the fast verification code here.

$ python main.py --dir_data datta_path --data_name KITTIDC --split_json ../data_json/kitti_dp.json \
     --patch_height 352 --patch_width 706 --gpus 4,5,6,7 --loss 1.0*L1+1.0*L2+1.0*DDIM --epochs 30 \
     --batch_size 4 --max_depth 88.0 --num_sample 0 --save NAME_TO_SAVE \
     --model_name Diffusion_DCbase_ --backbone_module mmbev_resnet --backbone_name mmbev_res50 --head_specify DDIMDepthEstimate_Res

You can also find the backbone file here src/model/backbone/mmbev_resnet.py

For ResNet, I think we directly train from scratch as the conv-based model already have spatial priors and is easier to train. But yet just in case here is the ResNet pretrained model ResNet Checkpoint 30

unlugi commented 1 year ago

Dear author, thank you for your answer. I realized that the mmbev_resnet contains a ResNet-like class - it is not exactly the same architecture as eg ResNet18 or resnet 50. So it is not possible to load the original ResNet18 model pretrained network. and use that as the depth_backbone.

Is there a reason why you changed the architecture of resnet? Thank you, best regards.

duanyiqun commented 1 year ago

Sorry for any inconvenience may leaded. Yeah, this is kind of 'legacy' problem, this code is inherent from another project. I think the backbone just modify the original model by adding a new mode of CBAM block (convolutional block attention). But you can switch between basic and CBAM in this class https://github.com/duanyiqun/DiffusionDepth/blob/187c1f8d7fd03c2002a1e29de41379b72c695728/src/model/backbone/mmbev_resnet.py#L125C29-L125C34. If using the basic mode, the structure should be the same with original ResNet except for some key names. Do a dict renaming might be help to load the original weights.

The reason why we use CBAM is that we found attention is important for extract features for spatial perception, but it is just limited in our experiments.

Best regards

unlugi commented 1 year ago

Thanks a lot!