What is the configuration of pertraining on hypersim ?

sysu19351158 commented 1 year ago

Epoch, Task weighting or the other settings, can you show the training command on hypersim ?

danielS91 commented 1 year ago

For NYUv2:

python main.py \
    --tasks semantic normal scene instance orientation \
    --enable-panoptic \
    --results-basepath /some/path \
    --validation-skip 0.95 \
    --checkpointing-skip 0.95 \
    --checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
    --rgb-encoder-backbone resnet34 \
    --rgb-encoder-backbone-block nonbottleneck1d \
    --depth-encoder-backbone resnet34 \
    --depth-encoder-backbone-block nonbottleneck1d \
    --encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
    --input-modalities rgb depth \
    --tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
    --learning-rate 0.005 \
    --dataset hypersim \
    --subset-train 0.2 \
    --instance-center-heatmap-top-k 128

For SUNRGB-D:

python main.py \
    --tasks semantic normal scene instance orientation \
    --enable-panoptic \
    --results-basepath /some/path \
    --validation-skip 0.95 \
    --checkpointing-skip 0.95 \
    --checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
    --rgb-encoder-backbone resnet34 \
    --rgb-encoder-backbone-block nonbottleneck1d \
    --depth-encoder-backbone resnet34 \
    --depth-encoder-backbone-block nonbottleneck1d \
    --encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
    --input-modalities rgb depth \
    --tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
    --learning-rate 0.005 \
    --dataset hypersim \
    --subset-train 0.3 \
    --instance-center-heatmap-top-k 128

sysu19351158 commented 1 year ago

Thank you so much ! The epoch number had not been set in the command, does this mean that the number of epoch is 500, as set in the args.py ？

danielS91 commented 1 year ago

Yes. However, note that the actual number of iterations also depends on the specified subset parameter. Even with a random subset of 0.2 or 0.3 per epoch, training on an A100 will take around one week.

sysu19351158 commented 1 year ago

Thank you ! 🙏 But there is an another problem. That is when I train EMSANet on nyuv2 with the the pretrained weights for the encoder backbone ResNet-34 NBt1D, using the command in the last of Readme file, the test miou is 0.5041. It is different from the paper——0.5097, though I repeated the training process three times. Did I do something wrong ?

danielS91 commented 1 year ago

This should not happen. I will run a test training to double-check this.

danielS91 commented 1 year ago

Ok, I did some test trainings and was able to almost reproduce the reported results in a more recent environment:

                                       task: ['semantic', 'scene', 'instance', 'orientation']
                             task_weighting: [1.0, 0.25, 3.0, 1.0]
                         instance_weighting: [2, 1]
                                         lr: 0.03
                                      wandb: EMSANet-nyuv2-r34nbt1d-testruns astral-firefly-6 (2tnzlo26)
                                  wandb_url: https://wandb.ai/nicr/EMSANet-nyuv2-r34nbt1d-testruns/runs/2tnzlo26
                                  epoch_max: 499

valid_panoptic_all_with_gt_deeplab_pq (447)
      valid_instance_all_with_gt_deeplab_pq: 0.6060
               valid_orientation_mae_gt_deg: 18.4523
              valid_panoptic_all_deeplab_pq: 0.4324
      valid_panoptic_all_with_gt_deeplab_pq: 0.4324
      valid_panoptic_all_with_gt_deeplab_rq: 0.5183
      valid_panoptic_all_with_gt_deeplab_sq: 0.8253
       valid_panoptic_deeplab_semantic_miou: 0.5123
             valid_panoptic_mae_deeplab_deg: 16.1432
                           valid_scene_bacc: 0.7684
                        valid_semantic_miou: 0.5083

Note that the learning rate is slightly lower than the reported value in the paper: 0.04 (paper) vs 0.03 (here). However, as the environment is different, I enqueued runs with 0.02, 0.03, and 0.04. The best result is shown above. It was at epoch 447 based on valid_panoptic_all_with_gt_deeplab_pq.

Training was done on an A100 40GB with driver 470.63.01. Please find below additional details on the environment.

conda list | grep -e torch -e cuda
cuda                      11.7.1                        0    nvidia
cuda-cccl                 11.7.91                       0    nvidia
cuda-command-line-tools   11.7.1                        0    nvidia
cuda-compiler             11.7.1                        0    nvidia
cuda-cudart               11.7.99                       0    nvidia
cuda-cudart-dev           11.7.99                       0    nvidia
cuda-cuobjdump            11.7.91                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-cuxxfilt             11.7.91                       0    nvidia
cuda-demo-suite           11.8.86                       0    nvidia
cuda-documentation        11.8.86                       0    nvidia
cuda-driver-dev           11.7.99                       0    nvidia
cuda-gdb                  11.8.86                       0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-libraries-dev        11.7.1                        0    nvidia
cuda-memcheck             11.8.86                       0    nvidia
cuda-nsight               11.8.86                       0    nvidia
cuda-nsight-compute       11.8.0                        0    nvidia
cuda-nvcc                 11.7.99                       0    nvidia
cuda-nvdisasm             11.8.86                       0    nvidia
cuda-nvml-dev             11.7.91                       0    nvidia
cuda-nvprof               11.8.87                       0    nvidia
cuda-nvprune              11.7.91                       0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvrtc-dev            11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-nvvp                 11.8.87                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cuda-sanitizer-api        11.8.86                       0    nvidia
cuda-toolkit              11.7.1                        0    nvidia
cuda-tools                11.7.1                        0    nvidia
cuda-visual-tools         11.7.1                        0    nvidia
cudatoolkit               11.3.1               h2bc3f7f_2
ffmpeg                    4.3                  hf484d3e_0    pytorch
pytorch                   1.13.0          py3.8_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h67b0de4_0    pytorch
pytorch-lightning         1.5.8                    pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.13.0               py38_cu117    pytorch
torchmetrics              0.10.2                   pypi_0    pypi
torchvision               0.14.0               py38_cu117    pytorch

I hope this helps.

TUI-NICR / EMSANet

What is the configuration of pertraining on hypersim ? #5