Is your code missing a processing step？

HuitongJin commented 1 year ago

Dear RozDavid, hello, I am reproducing your work, but I found that when doing text_representation_train and train_model, the result of running according to your code is 10-20 mIoU points worse than the result in your paper. I'm wondering if there is related data preprocessing that you didn't put on GitHub, or the training script lacks some parameters mentioned in the ablation experiment? Can you please confirm it for me, thank you very much!

RozDavid commented 1 year ago

Hey @HuitongJin,

Every necessary step for our final results are detailed in this repo. Also, others were able to reproduce our results already, but it is still possible that local setups can have weird configurations producing such issues.

Could you detail all your steps you have done already with results - without extra information it's not quite possible to find the source of error.

Regards, David

HuitongJin commented 1 year ago

Hello, during this period of time I have done experiments on the above problems, and I found that the batch size has a great impact on the results. Before, because of hardware reasons, I set the batch size to 2, so mIoU is very different from the results of the paper.

To this end, I did some more experiments, but still couldn't align with the results of the paper. The hardware configuration and the parameters I modified are as follows:

(1) A100 (80G); 2GPU * 4per = 8 batch size; Keep other configurations consistent with your code; in the pretrain phase, train 500epoch, the best mIoU is 18.18; in the finetune phase, train until convergence, and finally the best The mIoU is 25.85, which is 3 percentage points worse than the paper results.

(2) V100 (32G); 4GPU * 4per = 8 batch size, limit_points=500000; keep other configurations consistent with your code; in the heat-resistant train stage, train 500epoch, the best mIoU is 19.86; in the finetune stage, keep training Until convergence, the best mIoU is 24.27.

(3) A100 (80G); 1GPU * 16per = 16 batch size; keep other configurations consistent, train 500epoch in the pretrain stage, the best mIoU is 16.45, it seems impossible to appear better than the first two in the finetune stage As a result, no finetune operation was performed.

According to the "implementation details" in your paper, you should set the batch size to 8, but as mentioned earlier, the best mIoU and the paper are still 3% less, can you provide some tips or help?

RozDavid commented 1 year ago

Hey,

A couple of questions to clarify what might be going wrong:

Can you provide your train logs so I could check if something is off there?
What do you mean by the heat-resistant train stage? I dont quite get that part.
Which model type do you use? In our paper we show experiments with 34C and 34D, which makes a significant difference
Which loss type do you use and if the class-balanced focal, which weighting factor?
Could you try training a model from scratch (no pretraining at all)? You should be able to reproduce standard ME performance with that.

My gut feeling is that you are either training with the 34C model or you somehow fail to load the pretrained weights into the model in the beginning of the finetuning stage.

HuitongJin commented 1 year ago

Hello, I shared my train log on the google network disk. It should be noted that the 34D_2G_4B_500E_18.ckpt used in the finetune stage is copied directly from the pretrian stage (checkpoint-val_miou=18.18-step=84258). https://drive.google.com/drive/folders/11014zBLcLgtRgKq79XdiFJXIM9KYbl_m?usp=sharing

RozDavid commented 1 year ago

Hi,

interesting, after the first check your parameters are looking alright and it seems you are also loading the pretrained weights. Could you try finetuning from the weights that we provided here with this repo? The pretrained mIoU score is reasonable though, we had those numbers too with a very small margin of difference.

HuitongJin commented 1 year ago

Hi, I use your pre-trained model parameters, finetune, after 500 epochs, the best mIoU is 27.24. This result seems to be significantly improved compared to my previous training from scratch, but it is still different from the results in the paper. The following is the log file: https://drive.google.com/file/d/1DaRpsHTPu9UmJNG4lC5mqwI6aLzk4XCs/view?usp=sharing

RozDavid commented 1 year ago

Right, but thats expected at this point. I see from your logs, that you are using the standard CE losses, instead of our class balanced focal loss. Our results were 27.73 instead of 27.24 that you obtained, which is still weird to be honest, but not that far away at this point. I'm not sure what might be the problem there, but as you had problems with the pretraining performance as well, I assume you have some weird system configuration error that effects both pretraining and finetuning, or accidentaly not using both gpus/not sharing BN params. Alternatively, you could also increase accumulated_gradient_batch number just to check if simply your batch number is off.

Could you try with the focal loss as well? That should also significantly improve the performance, especially for the tail split.

Let me know if that helps!

Cheers, David

P.s. Could you also send log curves as well (wandb or tensorboard - would be a bit easier to interpret the training)

RozDavid / LanguageGroundedSemseg

Is your code missing a processing step？ #22