Open JAYATEJAK opened 1 year ago
Have you checked the existence of that .jpg file: datasets/coco/trainval2014/COCO_train2014_000000089914.jpg in the corresponding folder?
Hi @ducminhkhoi Thanks for reply. Is it like we have to combine all the train2014 and val2014 training images into new folder trainval2014?. In my coco dataset folder I have separate train2014 and val2014 folders having respective images.
Yes, the training set of the few-shot setting comprises the images from both the training and validation set of COCO 2014.
Thank you so much, that issue was solved. But when I started base task training (without changing any hyperparameters) it is giving FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged. Could you please let me know what might be the issue?
You can debug the line of error by using the breakpoint() command in Python. Check for the length of GT boxes/masks as well as proposals
Thanks for reply, that issue was solved. But in finetuning.sh file [line33, src1] why we are passing model_final_early.pth to code and how and where this is model is created?
Thanks in advance.
This is the link to the pth of the base model as in line 18 (of the file). The purpose is to concatenate the weight of the last layer of the recently fine-tuned model to that of the base model as described in the paper
Then why we are loading again final model as src2 [in line34 finetuning.sh]?.
Actually, I didn't understand difference between models src1 and src2 [line33, line34] in finetuning.sh
python3 -m tools.ckpt_surgery --coco \ --src1 checkpoints/coco/${network}/${network}_R_${arch}_FPN_base${suffix}/model_final_early.pth \ --src2 checkpoints/coco/${network}/${network}_R_${arch}_FPN_ft_novel_${shot}shot${suffix}${suffix2}/model_final.pth \ --method combine \ --save-dir checkpoints/coco/${network}/${network}_R_${arch}_FPN_all_final_${shot}shot${suffix}${suffix2}
src1 is the base model, src2 is the fine-tuned model. This command combines them together. That's it.
Hi @ducminhkhoi , I am facing issue while replicating the paper results. If I am using exact same hyperparameter (lr = 0.02) to train the model, model is diverging and getting floating point error as I mentioned earlier. So, I fixed the learning rate with lower value (lr = 0.0002) and trained the model, it eliminates the diverging loss issue, but Average precision (AP) coming very low (~2) on base task itself.
Could please give any suggestion why it is happening?
Hi @ducminhkhoi , It is a great contribution to few shot incremental object detection problem. I am trying to replicate the results that given in the paper. I am getting following issue.
Thanks