VinAIResearch / iFS-RCNN

iFS-RCNN: An Incremental Few-shot Instance Segmenter (CVPR 2022)
Apache License 2.0
14 stars 4 forks source link

No such file #2

Open JAYATEJAK opened 1 year ago

JAYATEJAK commented 1 year ago

Hi @ducminhkhoi , It is a great contribution to few shot incremental object detection problem. I am trying to replicate the results that given in the paper. I am getting following issue.

  1. While running train.sh file, I am getting the following error: FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco/trainval2014/COCO_train2014_000000089914.jpg' Could you please let me know why this is happening.

Thanks

ducminhkhoi commented 1 year ago

Have you checked the existence of that .jpg file: datasets/coco/trainval2014/COCO_train2014_000000089914.jpg in the corresponding folder?

JAYATEJAK commented 1 year ago

Hi @ducminhkhoi Thanks for reply. Is it like we have to combine all the train2014 and val2014 training images into new folder trainval2014?. In my coco dataset folder I have separate train2014 and val2014 folders having respective images.

ducminhkhoi commented 1 year ago

Yes, the training set of the few-shot setting comprises the images from both the training and validation set of COCO 2014.

JAYATEJAK commented 1 year ago

Thank you so much, that issue was solved. But when I started base task training (without changing any hyperparameters) it is giving FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged. Could you please let me know what might be the issue?

ducminhkhoi commented 1 year ago

You can debug the line of error by using the breakpoint() command in Python. Check for the length of GT boxes/masks as well as proposals

JAYATEJAK commented 1 year ago

Thanks for reply, that issue was solved. But in finetuning.sh file [line33, src1] why we are passing model_final_early.pth to code and how and where this is model is created?

Thanks in advance.

ducminhkhoi commented 1 year ago

This is the link to the pth of the base model as in line 18 (of the file). The purpose is to concatenate the weight of the last layer of the recently fine-tuned model to that of the base model as described in the paper

JAYATEJAK commented 1 year ago

Then why we are loading again final model as src2 [in line34 finetuning.sh]?.

Actually, I didn't understand difference between models src1 and src2 [line33, line34] in finetuning.sh

ducminhkhoi commented 1 year ago

python3 -m tools.ckpt_surgery --coco \ --src1 checkpoints/coco/${network}/${network}_R_${arch}_FPN_base${suffix}/model_final_early.pth \ --src2 checkpoints/coco/${network}/${network}_R_${arch}_FPN_ft_novel_${shot}shot${suffix}${suffix2}/model_final.pth \ --method combine \ --save-dir checkpoints/coco/${network}/${network}_R_${arch}_FPN_all_final_${shot}shot${suffix}${suffix2}

src1 is the base model, src2 is the fine-tuned model. This command combines them together. That's it.

JAYATEJAK commented 1 year ago

Hi @ducminhkhoi , I am facing issue while replicating the paper results. If I am using exact same hyperparameter (lr = 0.02) to train the model, model is diverging and getting floating point error as I mentioned earlier. So, I fixed the learning rate with lower value (lr = 0.0002) and trained the model, it eliminates the diverging loss issue, but Average precision (AP) coming very low (~2) on base task itself.

Could please give any suggestion why it is happening?