Training logs - Githubissues

bytedance / fc-clip

[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Apache License 2.0

285 stars 28 forks source link

Training logs #6

Open yuanpengtu opened 1 year ago

yuanpengtu commented 1 year ago

Hi, thanks for your great work. Could you provide the training log on the COCO dataset please? I'd like to compare my reproduction results to find out what went wrong. Thanks.

zifuwan commented 1 year ago

Hi, I'm also reproducing the results. Can you show your training logs here? Here is mine: I use two GPUs with batch size=4 for training, and I'm unsure if the small batch size leads to the oscillation of loss. Is yours the same?

Thanks.

wusize commented 1 year ago

Cannot reproduce the results of convnext on ADE20k. Has anybody successfully reproduce the result?

cornettoyu commented 1 year ago

Hi, thanks for your great work. Could you provide the training log on the COCO dataset please? I'd like to compare my reproduction results to find out what went wrong. Thanks.

Hi,

Thanks for your interest. Could you please specify which results are wrong in your reproduction (e.g., sharing your experiment settings, results so we can look into the problem)? We will also upload the training log soon.

cornettoyu commented 1 year ago

Hi, I'm also reproducing the results. Can you show your training logs here? Here is mine: I use two GPUs with batch size=4 for training, and I'm unsure if the small batch size leads to the oscillation of loss. Is yours the same?

Thanks.

Hi,

We have not tried using a different setting. Please ensure you are following the provided config if aiming at reproducing our results. Thanks.

cornettoyu commented 1 year ago

Cannot reproduce the results of convnext on ADE20k. Has anybody successfully reproduce the result?

Can you provide your results here so we can look into what is wrong? We will upload our training log as well soon. It is hard to tell what is wrong with no details except "cannot reproduce" :)

wusize commented 1 year ago

Cannot reproduce the results of convnext on ADE20k. Has anybody successfully reproduce the result?

Can you provide your results here so we can look into what is wrong? We will upload our training log as well soon. It is hard to tell what is wrong with no details except "cannot reproduce" :)

Hi! Thank you for your reply. I will try another run of the training on convnext and share you the log. By the way, did you save the checkpoint of the best result or simply the last one. I found the PQ on ADE saturated very quickly. And further training it for more iterations only improves the performance on COCO.

cornettoyu commented 1 year ago

Cannot reproduce the results of convnext on ADE20k. Has anybody successfully reproduce the result?

Can you provide your results here so we can look into what is wrong? We will upload our training log as well soon. It is hard to tell what is wrong with no details except "cannot reproduce" :)

Hi! Thank you for your reply. I will try another run of the training on convnext and share you the log. By the way, did you save the checkpoint of the best result or simply the last one. I found the PQ on ADE saturated very quickly. And further training it for more iterations only improves the performance on COCO.

Thanks for checking. Yes, the PQ can achieves 25~26 at the first 50k steps, while improvement can be relatively smaller in the remaining 300k steps. We keep the best checkpoint for our final results.

cornettoyu commented 1 year ago

Please see the attached log file for a reference of training/validation log FC-CLIP with ConvNeXt-L. The provided checkpoint is at step 309999, with "panoptic_seg/PQ": 26.788164947280208

Furthermore, we ran the experiments with CUDA 11.3, PyTorch 1.12.1, NVIDIA Tesla V100-SXM2-32GB, hope these information may help you figure out the problem. And free feel to reach out if you encounter any problem with details :)

yxchng commented 12 months ago

@cornettoyu how do you select the best checkpoint? do you evaluate all of them?

yxchng commented 12 months ago

@wusize what results do you get?

SuleBai commented 7 months ago

Please see the attached log file for a reference of training/validation log FC-CLIP with ConvNeXt-L. The provided checkpoint is at step 309999, with "panoptic_seg/PQ": 26.788164947280208

Furthermore, we ran the experiments with CUDA 11.3, PyTorch 1.12.1, NVIDIA Tesla V100-SXM2-32GB, hope these information may help you figure out the problem. And free feel to reach out if you encounter any problem with details :)

Hi! It seems that PyTorch 1.12.1 is inconsistent with the one you provided in the INSTALL.md. So if I want to reproduce your results, should I run the command below when preparing the environment? conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -c nvidia