Open sysu19351158 opened 1 year ago
For NYUv2:
python main.py \
--tasks semantic normal scene instance orientation \
--enable-panoptic \
--results-basepath /some/path \
--validation-skip 0.95 \
--checkpointing-skip 0.95 \
--checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
--rgb-encoder-backbone resnet34 \
--rgb-encoder-backbone-block nonbottleneck1d \
--depth-encoder-backbone resnet34 \
--depth-encoder-backbone-block nonbottleneck1d \
--encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
--input-modalities rgb depth \
--tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
--learning-rate 0.005 \
--dataset hypersim \
--subset-train 0.2 \
--instance-center-heatmap-top-k 128
For SUNRGB-D:
python main.py \
--tasks semantic normal scene instance orientation \
--enable-panoptic \
--results-basepath /some/path \
--validation-skip 0.95 \
--checkpointing-skip 0.95 \
--checkpointing-metrics valid_semantic_miou bacc panoptic_deeplab_semantic_miou panoptic_all_deeplab_pq panoptic_all_with_gt_deeplab_pq \
--rgb-encoder-backbone resnet34 \
--rgb-encoder-backbone-block nonbottleneck1d \
--depth-encoder-backbone resnet34 \
--depth-encoder-backbone-block nonbottleneck1d \
--encoder-backbone-pretrained-weights-filepath /path/to/our/imagenet/checkpoint.pth \
--input-modalities rgb depth \
--tasks-weighting 1.0 0.25 0.25 2.0 0.0 \
--learning-rate 0.005 \
--dataset hypersim \
--subset-train 0.3 \
--instance-center-heatmap-top-k 128
Thank you so much ! The epoch number had not been set in the command, does this mean that the number of epoch is 500, as set in the args.py ?
Yes. However, note that the actual number of iterations also depends on the specified subset parameter. Even with a random subset of 0.2 or 0.3 per epoch, training on an A100 will take around one week.
Thank you ! 🙏 But there is an another problem. That is when I train EMSANet on nyuv2 with the the pretrained weights for the encoder backbone ResNet-34 NBt1D, using the command in the last of Readme file, the test miou is 0.5041. It is different from the paper——0.5097, though I repeated the training process three times. Did I do something wrong ?
This should not happen. I will run a test training to double-check this.
Ok, I did some test trainings and was able to almost reproduce the reported results in a more recent environment:
task: ['semantic', 'scene', 'instance', 'orientation']
task_weighting: [1.0, 0.25, 3.0, 1.0]
instance_weighting: [2, 1]
lr: 0.03
wandb: EMSANet-nyuv2-r34nbt1d-testruns astral-firefly-6 (2tnzlo26)
wandb_url: https://wandb.ai/nicr/EMSANet-nyuv2-r34nbt1d-testruns/runs/2tnzlo26
epoch_max: 499
valid_panoptic_all_with_gt_deeplab_pq (447)
valid_instance_all_with_gt_deeplab_pq: 0.6060
valid_orientation_mae_gt_deg: 18.4523
valid_panoptic_all_deeplab_pq: 0.4324
valid_panoptic_all_with_gt_deeplab_pq: 0.4324
valid_panoptic_all_with_gt_deeplab_rq: 0.5183
valid_panoptic_all_with_gt_deeplab_sq: 0.8253
valid_panoptic_deeplab_semantic_miou: 0.5123
valid_panoptic_mae_deeplab_deg: 16.1432
valid_scene_bacc: 0.7684
valid_semantic_miou: 0.5083
Note that the learning rate is slightly lower than the reported value in the paper: 0.04 (paper) vs 0.03 (here). However, as the environment is different, I enqueued runs with 0.02, 0.03, and 0.04. The best result is shown above. It was at epoch 447 based on valid_panoptic_all_with_gt_deeplab_pq
.
Training was done on an A100 40GB with driver 470.63.01. Please find below additional details on the environment.
conda list | grep -e torch -e cuda
cuda 11.7.1 0 nvidia
cuda-cccl 11.7.91 0 nvidia
cuda-command-line-tools 11.7.1 0 nvidia
cuda-compiler 11.7.1 0 nvidia
cuda-cudart 11.7.99 0 nvidia
cuda-cudart-dev 11.7.99 0 nvidia
cuda-cuobjdump 11.7.91 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-cuxxfilt 11.7.91 0 nvidia
cuda-demo-suite 11.8.86 0 nvidia
cuda-documentation 11.8.86 0 nvidia
cuda-driver-dev 11.7.99 0 nvidia
cuda-gdb 11.8.86 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-libraries-dev 11.7.1 0 nvidia
cuda-memcheck 11.8.86 0 nvidia
cuda-nsight 11.8.86 0 nvidia
cuda-nsight-compute 11.8.0 0 nvidia
cuda-nvcc 11.7.99 0 nvidia
cuda-nvdisasm 11.8.86 0 nvidia
cuda-nvml-dev 11.7.91 0 nvidia
cuda-nvprof 11.8.87 0 nvidia
cuda-nvprune 11.7.91 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvrtc-dev 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-nvvp 11.8.87 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cuda-sanitizer-api 11.8.86 0 nvidia
cuda-toolkit 11.7.1 0 nvidia
cuda-tools 11.7.1 0 nvidia
cuda-visual-tools 11.7.1 0 nvidia
cudatoolkit 11.3.1 h2bc3f7f_2
ffmpeg 4.3 hf484d3e_0 pytorch
pytorch 1.13.0 py3.8_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h67b0de4_0 pytorch
pytorch-lightning 1.5.8 pypi_0 pypi
pytorch-mutex 1.0 cuda pytorch
torchaudio 0.13.0 py38_cu117 pytorch
torchmetrics 0.10.2 pypi_0 pypi
torchvision 0.14.0 py38_cu117 pytorch
I hope this helps.
Epoch, Task weighting or the other settings, can you show the training command on hypersim ?