I couldn't reproduce the performance of ptv2m2 on S3DIS

linhaojia13 commented 1 year ago

I use the command as follow for 2 times to train the ptv2m2:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/train.sh -g 4 -d s3dis -c semseg-ptv2m2-0-base -n semseg-ptv2m2-0-base

The environments is installed following the instructions in the README.md.

The results are as follow:

I found that the performance is heavily affected by the randomness (Results of run1 and run2 has a signifincant gap). What's more, both of the two trials have a significant performance gap with the released model (72.6).

Is there any mistake in my attempt to reproduce the experimental results? In your experiments, is there such instability caused by randomness？ Is a result as high as the released mIoU 72.6 is difficult to achieve unless running the experiments for many times?

Gofinge commented 1 year ago

I only ran once after I finished the released version code and achieved the 72.6%. Empirically speaking, model performance on S3DIS is always unstable, but the validation mIoU during training is slightly lower than I expected. Usually, it should be higher than 70%.

I can compare the log with my local record if you can share your Tensorboard file and train log (email me or share a link). Let's figure out the reason.

linhaojia13 commented 1 year ago

Thank you very much! Here are may log files: https://drive.google.com/drive/folders/1ffoNlHwY1XXk8VGwX9dWTXmvDM7gr4ig

Gofinge commented 1 year ago

Hi, I checked and compared the training curve and config with my local record, the config is the same as my released config, but the validation mIoU is lower than my local record. I tried this released config only once but explored several different settings. About 40% of training evaluate mIoU higher than 70% (I did not test them, only test the released one). Here are some of my local records and your record (the pink curve is yours):

Training on S3DIS is especially unstable due to overfitting. And this is the reason we adopt a multi-step scheduler instead of one cycle. Here are some suggestions which might turn the training process more stable:

1. Change the unpooling mode from "interp" to "map." Mapping-based unpooling shows more robustness (evaluated loss lower, not much difference on mIoU).
2. Apply 4 stage encoder and decoder. The result we reported in the original paper (71.6%) was run with a 4-stage encoder-decoder setting. Since the initial grid size for S3DIS is twice of ScanNet, 3 stages are also enough for S3DIS. Consequently, the released version is a more efficient 3 stages model.
3. Turn the grid size for grid pooling. I did not turn this config on S3DIS, but it matters.

Finally, a practical tip is to train for more time, and only test models have better mIoU (e.g., larger than 70%). Transformer-based models are unstable on S3DIS, and we may need an effective pre-training framework.

linhaojia13 commented 1 year ago

Thank you very much for your patient and detailed answer! Considering that the results on s3dis is unstable, next I will switch to scannet to evaluate ptv2.

Pointcept / PointTransformerV2

I couldn't reproduce the performance of ptv2m2 on S3DIS #11