Closed zixulll closed 10 months ago
The command looks right in general, however, I noticed the --no-pretrained-backbone
argument. Even if you've specified --rgbd-encoder-backbone-pretrained-weights-filepath
, this argument prevents the use of the pretrained ImageNet weights. This leads to a performance gap and might be the root cause of your issue.
To confirm that the weights are being loaded correctly, you should see this print just before training begins:
Get model and dataset
Loading pretrained weights from: './trained_models/imagenet/swin_multi_t_v2_128.pth'
Please try running the training without the --no-pretrained-backbone
argument. If this still doesn't resolve your issue, please let me know.
The above problems have been solved according to your guidance, thank you. Sorry I have another question, when the model is trained on the sunrgbd dataset, do I need to add --sunrgbd-depth-do-not-force-mm to the training script? I found this line in the inference script. My detailed training script on the sunrgbd dataset is as follows:
CUDA_VISIBLE_DEVICES=1 python main.py \ --results-basepath ./results \ --dataset sunrgbd \ --dataset-path ./datasets/sunrgbd \ --input-modalities rgbd \ --tasks semantic scene instance orientation \ --sunrgbd-depth-do-not-force-mm\ --enable-panoptic \ --tasks-weighting 1.0 0.25 2.0 0.5 \ --instance-weighting 2 1 \ --rgbd-encoder-backbone swin-multi-t-v2-128 \ --encoder-normalization layernorm \ --rgbd-encoder-backbone-pretrained-weights-filepath ./trained_models/imagenet/swin_multi_t_v2_128.pth \ --validation-batch-size 8 \ --validation-skip 1.0 \ --checkpointing-skip 0.8 \ --checkpointing-best-only \ --checkpointing-metrics valid_semantic_miou bacc mae_gt_deg panoptic_deeplab_semantic_miou panoptic_all_with_gt_deeplab_pq \ --batch-size 8 \ --learning-rate 0.03 \ --wandb-mode disabled
Great to hear that the previous issue is resolved.
Regarding SUNRGB-D training: Our original training used depth images in a way, which now can only be replicated using the --sunrgbd-depth-do-not-force-mm
flag. This flag was introduced due to a change in the Dataset Repo (in Version 0.5.4), which adjusted the scaling of SUNRGB-D images to millimeters, to maintain backward compatibility. The updated scaling is particularly advantageous for deploying the model in real-world applications.
So setting the flag or not depends on your scenario:
Thanks for your patient guidance, I will follow your method to experiment. Thank you very much.
Dear author, thank you for your excellent work, when I used the decoder of EMSANet for the semantic segmentation task on NYUv2 and SUNRGB-D datasets, the metrics of the experiment were far from the original paper. Could you please provide the detailed multi-task training code? The code I used is as follows: CUDA_VISIBLE_DEVICES=1 python main.py \ --results-basepath ./results \ --dataset nyuv2 \ --dataset-path ./datasets/nyuv2 \ --input-modalities rgbd \ --tasks semantic scene instance orientation \ --enable-panoptic \ --tasks-weighting 1.0 0.25 2.0 0.5 \ --instance-weighting 2 1 \ --rgbd-encoder-backbone swin-multi-t-v2-128 \ --encoder-normalization layernorm \ --rgbd-encoder-backbone-pretrained-weights-filepath ./trained_models/imagenet/swin_multi_t_v2_128.pth \ --validation-batch-size 8 \ --validation-skip 1.0 \ --checkpointing-skip 0.8 \ --checkpointing-best-only \ --checkpointing-metrics valid_semantic_miou bacc mae_gt_deg panoptic_deeplab_semantic_miou panoptic_all_with_gt_deeplab_pq \ --batch-size 8 \ --learning-rate 0.03 \ --no-pretrained-backbone \ --semantic-decoder emsanet \ --semantic-encoder-decoder-fusion swin-ln-add \ --semantic-decoder-n-channels 512 256 128 \ --semantic-decoder-upsampling learned-3x3-zeropad \ --wandb-mode disabled