However,when I tried to train ZegCLIP under the setting of vpt_seg_zero_vit-b_512x512_10k_12_10_st, I got a ridiculous result that was far from the results in the original paper. One difference is that I set the batch_size to 8 rather than 16, considering the limitation of GPUs. Simultaneously, I changed the training iterations to 20k. The evaluation results are as below. Could you tell me what caused this? I would appreciate it!
Dear author,
Thank you for sharing your excellent work.
However,when I tried to train ZegCLIP under the setting of vpt_seg_zero_vit-b_512x512_10k_12_10_st, I got a ridiculous result that was far from the results in the original paper. One difference is that I set the batch_size to 8 rather than 16, considering the limitation of GPUs. Simultaneously, I changed the training iterations to 20k. The evaluation results are as below. Could you tell me what caused this? I would appreciate it!