Detailed training parameters

TUI-NICR / EMSAFormer

Efficient Multi-task Scene Analysis with Transformers

Apache License 2.0

6 stars 1 forks source link

Detailed training parameters #3

Closed zixulll closed 10 months ago

zixulll commented 10 months ago

Dear author, thank you for your excellent work, when I used the decoder of EMSANet for the semantic segmentation task on NYUv2 and SUNRGB-D datasets, the metrics of the experiment were far from the original paper. Could you please provide the detailed multi-task training code? The code I used is as follows: CUDA_VISIBLE_DEVICES=1 python main.py \ --results-basepath ./results \ --dataset nyuv2 \ --dataset-path ./datasets/nyuv2 \ --input-modalities rgbd \ --tasks semantic scene instance orientation \ --enable-panoptic \ --tasks-weighting 1.0 0.25 2.0 0.5 \ --instance-weighting 2 1 \ --rgbd-encoder-backbone swin-multi-t-v2-128 \ --encoder-normalization layernorm \ --rgbd-encoder-backbone-pretrained-weights-filepath ./trained_models/imagenet/swin_multi_t_v2_128.pth \ --validation-batch-size 8 \ --validation-skip 1.0 \ --checkpointing-skip 0.8 \ --checkpointing-best-only \ --checkpointing-metrics valid_semantic_miou bacc mae_gt_deg panoptic_deeplab_semantic_miou panoptic_all_with_gt_deeplab_pq \ --batch-size 8 \ --learning-rate 0.03 \ --no-pretrained-backbone \ --semantic-decoder emsanet \ --semantic-encoder-decoder-fusion swin-ln-add \ --semantic-decoder-n-channels 512 256 128 \ --semantic-decoder-upsampling learned-3x3-zeropad \ --wandb-mode disabled

Tripton commented 10 months ago

The command looks right in general, however, I noticed the --no-pretrained-backbone argument. Even if you've specified --rgbd-encoder-backbone-pretrained-weights-filepath, this argument prevents the use of the pretrained ImageNet weights. This leads to a performance gap and might be the root cause of your issue.

To confirm that the weights are being loaded correctly, you should see this print just before training begins:

Get model and dataset
Loading pretrained weights from: './trained_models/imagenet/swin_multi_t_v2_128.pth'

Please try running the training without the --no-pretrained-backbone argument. If this still doesn't resolve your issue, please let me know.

zixulll commented 10 months ago

The above problems have been solved according to your guidance, thank you. Sorry I have another question, when the model is trained on the sunrgbd dataset, do I need to add --sunrgbd-depth-do-not-force-mm to the training script? I found this line in the inference script. My detailed training script on the sunrgbd dataset is as follows:

CUDA_VISIBLE_DEVICES=1 python main.py \ --results-basepath ./results \ --dataset sunrgbd \ --dataset-path ./datasets/sunrgbd \ --input-modalities rgbd \ --tasks semantic scene instance orientation \ --sunrgbd-depth-do-not-force-mm\ --enable-panoptic \ --tasks-weighting 1.0 0.25 2.0 0.5 \ --instance-weighting 2 1 \ --rgbd-encoder-backbone swin-multi-t-v2-128 \ --encoder-normalization layernorm \ --rgbd-encoder-backbone-pretrained-weights-filepath ./trained_models/imagenet/swin_multi_t_v2_128.pth \ --validation-batch-size 8 \ --validation-skip 1.0 \ --checkpointing-skip 0.8 \ --checkpointing-best-only \ --checkpointing-metrics valid_semantic_miou bacc mae_gt_deg panoptic_deeplab_semantic_miou panoptic_all_with_gt_deeplab_pq \ --batch-size 8 \ --learning-rate 0.03 \ --wandb-mode disabled

Tripton commented 10 months ago

Great to hear that the previous issue is resolved. Regarding SUNRGB-D training: Our original training used depth images in a way, which now can only be replicated using the --sunrgbd-depth-do-not-force-mm flag. This flag was introduced due to a change in the Dataset Repo (in Version 0.5.4), which adjusted the scaling of SUNRGB-D images to millimeters, to maintain backward compatibility. The updated scaling is particularly advantageous for deploying the model in real-world applications.

So setting the flag or not depends on your scenario:

If your goal is to replicate our trainings as close as possible, you should set the flag.
If your focus is towards real-world application, you should omit the flag. This will use depth images as outputted by most depth sensors, and the training will still archive comparable results to the original trainings.

zixulll commented 10 months ago

Thanks for your patient guidance, I will follow your method to experiment. Thank you very much.