Issue training base model

bingykang / Fewshot_Detection

Few-shot Object Detection via Feature Reweighting

https://arxiv.org/abs/1812.01866

526 stars 111 forks source link

Issue training base model #19

Open christegho opened 4 years ago

christegho commented 4 years ago

I have been trying to train a base model for some time now.

I have had issues with the version of pytorch the code was built on. 0.3.1 would not work with CUDA versions past 8.0. But my GeGorce RTX 2080 would not work with CUDA versions below 9.0.

I managed to have the code base work with PyTorch 0.4.0 and 0.4.1, with CUDA 10.1.

I have two GPUs, each with 10986MB. I managed to have the base training run for many epochs, but then my whole machine would shut down all of the sudden, through the training. I suspect this is because of my RAM.

I did have to reduce the batch size and subdivisions, to get the training to start.

But this is all to say that I am not able to get a base model, and I am wondering if there is anyone who has a model to share?

I will commit my code for PyTorch >= 0.4.0 soon, on my fork, but it would be so nice to have weights I could use.

XinyiYS commented 4 years ago

You could try my trained base model: https://drive.google.com/open?id=1CSVFhfOHmRlbUsMu_eyBCvBWn_06a9zH

christegho commented 4 years ago

Hi Michael,

Thanks for sharing your trained base model. This is very helpful!

On Mon, Dec 2, 2019 at 9:05 AM Michael notifications@github.com wrote:

You could try my trained base model: https://drive.google.com/open?id=1CSVFhfOHmRlbUsMu_eyBCvBWn_06a9zH

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bingykang/Fewshot_Detection/issues/19?email_source=notifications&email_token=ABZDIBMBRYNOKPJWGIPOQN3QWTFWVA5CNFSM4JR275VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSYIVY#issuecomment-560301143, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZDIBM7OX3ZGMAF6SV6DI3QWTFWVANCNFSM4JR275VA .

XinyiYS commented 4 years ago

Thanks for sharing your trained base model. This is very helpful!

No problem, Chris. Give it a go. I didn't change any setting, it should give decent results on the base classes.

HuangLian126 commented 4 years ago

@christegho Hi, I try to train the base with torch 1.2.0 , torchvision 0.4.0 and CUDA 10.1. However, I get this error:

File "/home/hl/hl/Fewshot_Detection-master/region_loss.py", line 330, in forward pred_boxes[0] = x.data + grid_x RuntimeError: The size of tensor a (13) must match the size of tensor b (38870) at non-singleton dimension 3

The shape of x is torch[46,5,13,13], and the shape of x is torch[38870]. How do you fix this error?

Fly-dream12 commented 3 years ago

Have you solved it ? @ HuangLian126

li-yanling commented 3 years ago

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

XinyiYS commented 3 years ago

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

Hi yanling, sorry that I have removed the model from my google drive due to storage limit. Somehow I don't have a local backup of it. Apologies. Perhaps see if Chris would be able to provide a copy.

li-yanling commented 3 years ago

@XinyiYS @christegho Could you please share your base model? The google drive link has expired. Many thanks!

Hi yanling, sorry that I have removed the model from my google drive due to storage limit. Somehow I don't have a local backup of it. Apologies. Perhaps see if Chris would be able to provide a copy. Hi Xinyi, thanks for your reply:)