noble action to open source such work , some suggestion to improve

HelloRicky123 / Siamese-RPN

Full reimplementation of siamese rpn, has 0.24 eao on vot2017.

MIT License

223 stars 44 forks source link

noble action to open source such work , some suggestion to improve #1

Closed gjpicker closed 5 years ago

gjpicker commented 5 years ago

1, for the architecture and loss_fn is based on RPN ,while, your module's name is Siamesefc . so ,I think is appropriate way to rename "SiamFC-PyTorch/siamRPN/" ,instead of "SiamFC-PyTorch/siamfc/"

2, it is handsome way if add benchmark or experiment result in last of README.md, contrast your choiced baseline , like this gay's action.

ps: according to your dataloader's implementation ,I only perceive that your datasource is ILSVRC-VID ,which is smaller than ILSVRC (where siameseRPN's office author had done his experiment ) . from above words ,is your result gained a higher performance than baseline ?

3, can you explain the config.warm_epoch vs config.epoch

~~4,can you explain "np.clip" putting in use . eg : why use torch.clip~~

best regards

HelloRicky123 commented 5 years ago

2, In the paper they say "Compared to ILSVRC [29] which consists of about 4,000 videos annotated frame-by-frame, Youtube-BB [25] consists of more than 100,000 videos annotated once in every 30 frames. " and the ILSVRC-VID dataset I used has about 4417 videos, so this may have no problem probably. 3, In my initial idea, I use the former 3 layers' weight in the model the paper's code provided. And leave the 4 and 5 layers inited by nn.init.kaimingnormal. This will need some warm epochs training with small learning rate, and then with big learning rate. 4, This is to make the new boxes' center in the image.

gjpicker commented 5 years ago

Thanks a lot for your patent answer! I have gained more clear after your response.

However , can you show me more explanation about the third point or ,share some ref-links or paper birth to strengthen ? I'am very interesting in your idea ( a pretrained neural network ,joined with a initiate that replace its last layer, demand for multiple learning rates ). In my opinion , it's my first glance of multiple learning rates in training period. maybe like TTUR

gjpicker commented 5 years ago

Your code's quality is so high that I can't wait to commit here again ,after reading your source code cautiously .

perfect work!

HelloRicky123 commented 5 years ago

Sorry, I have trouble increasing the accuracy recently. Using different learning rate is just a trick I found useful sometimes. My trouble now is the youtube-BB dataset. Due to the vpn's limit dosage, I can't download the dataset. And I'm trying to get the 0.3 EAO on VOT2015 without youtube-bb, but only 0.22 by now.

zzpustc commented 5 years ago

@HelloRicky123 How about using TrackinngNet which is released in ECCV2018 instead? It contains more videos than Youtube-BB.

takecareofbigboss commented 5 years ago

hi, maybe you can use some ideas from object detection to improve your performance.

takecareofbigboss commented 5 years ago

@HelloRicky123

HelloRicky123 commented 5 years ago

@HelloRicky123 How about using TrackinngNet which is released in ECCV2018 instead? It contains more videos than Youtube-BB.

But this will make it not fair when compared with the paper's code.

zzpustc commented 5 years ago

@HelloRicky123 How much time have you spend to train the model with ILSVRC(VID)?

HelloRicky123 commented 5 years ago

About 150 image/s with two 1080Ti GPU.