Closed dota2mhxy closed 2 years ago
@Han-Jia
Hi,
The pre-training stages works the same as the standard classification task over all meta-training classes. You can replace the dataloader in the pretrain.py file to use your own dataset. To select the best model, I follow the few-shot learning setting with episodes of few-shot tasks (on meta-val classes).
Optimizing the meta-model directly could not get as good preformance as the pre-training. Please see this paper for more discussions.
The self-attention works as the a transformation. It indeed strengthens the correlation between prototypes due to the query-key-value architecture. The influence of the regularization could be found in the ablation study in our paper.
Hello, thank you for your thoughtful paper and open code. I have gained a lot after reading it, but I have some questions to ask you First, about pre training, if I don't load the pre training model, the accuracy of res12 and convnet used by feat in the backbone network is 47% and 59% respectively, which is far lower than 80% of RESNET loaded with the pre training model. I understand that pre training can speed up the training speed, but why it can greatly improve the accuracy. Then I see that there is a pretrain.py in the project , so can I use my own data to train the model? In the code here, the training set does not follow the small sample k-ways-n-shot method, but sends all data into the training in batches. Just like the ordinary CNN training, why not follow the feat training method?What should I pay attention to when I train with my own data set? Second, as for the feat model, I carefully read the code and found that instead of using transformer completely, simply used multiheadattention to input the proto to generate a new proto. I can simply understand that using self-attention strengthens the correlation between prototypes.and about loss fucntion? Here you use the prototype loss function + regularization function. I looked at the regularization function. It uses all the data of the support set and query set to average the prototype, and then calculates the Euclidean distance with these data sets. Is this loss function obviously helpful to improve the accuracy of the model, or may it not be obvious?Thank you for your reply