Closed asdfqwer2015 closed 5 years ago
En, to generalize to other classes like voc classes, will it work if we further construct a dataset from tracking task dataset e.g. trackingnet? For training phase, choost the instance in the first frame as 'target'(the instance in the first frame may be clearly) and sample a frames in each 20~30 frames as 'scene' to train the model? The constructed dataset will have more classes and more instance for each class. Is it available? Thanks again.
Hi, thanks for you comments. We may be adding the few-shot data and code in the future.
For the tracking scenario, this may be possible. I'd encourage you to look at some similar work to ours in tracking, DaSiamRPN. They do not release training code, but reading the paper may give some insights. It seems similar to what you are describing.
En, thanks for your reply and your recommend related paper. Yes, I mean the generated dataset similar with DaSiamRPN's. Do you think if trained with a dataset like this, the model can predict a instance with a different class concept from AVD, e.g. bicycle or animals?
Hope to see the few-shot data and code. :)
The DaSiamRPN method is very similar to ours, so I would expect them to work similarly. The main difference is in the size of the target image. In the tracking SiamRPNs, they have a strong prior on the size of the object does not change much frame to frame. For detection TDID pools the target to 1x1, there is no knowledge of what scale the target object is in the scene image. I would expect the tracking methods to work better on tracking and TDID to work better on detection.
En, same datasets with different tasks. Hope to see the expected results.
If I want to train the model with the trackingnet, should I a. train with trackingnet and then finetune with AVD or b. merge trackingnet and avd, and then train with the merged dataset? Thanks. @ammirato
Hi: Your model is very interesting. Thanks for your shared code. Why not add few-shot instance detection implements as in paper into the project? That might be more useful in practice. Can the model used for instance with the class from imagenet in test? Thanks.