euwern / proxynca_pp

The implementation of ProxyNCA++.
MIT License
50 stars 15 forks source link

Custom Dataset #1

Closed Vincent9797 closed 4 years ago

Vincent9797 commented 4 years ago

How do I modify your code to train it on a custom dataset?

euwern commented 4 years ago

All you need are images and their corresponding class labels. Then you can specify your training, validation, and evaluation set in the dataset/config.json files. I highly recommend to download the CUB200_2011 birds' dataset, and play around with the data loader dataset/cub.py

In short, you will need to create a file "dataset/your_dataset.py" and added it to the "dataset/init.py" and "dataset/config.json". "dataset/your_dataset.py" is where you define how to read your own dataset.

Vincent9797 commented 4 years ago

Can you explain this part in dataset/config.json?

image

Does it mean that your train dataloader only accepts classes 0 to 49?

In addition, could you explain what is the difference between train and trainval?

euwern commented 4 years ago

Yes, you are correct. You can also provide a list of classes, via hardcoding eg: "train": "[1,3,4,100]". The reason the data is structured this way is because we are training with some known classes and tested on unseen classes.

"train" is just a subset of "trainval". We select the best epoch to reduce the learning rate by evaluating on val. then during the "trainval" phase, we just reduce the learning rate based on what we learned from "train" and "val" set.

abhoi commented 4 years ago

Building on this question, what changes do you think are needed to extend this to one-shot learning? I realize the paper is focused only on zero-shot learning.

euwern commented 4 years ago

Hey abhoi, as you have realized, your question is out of the scope of this repo. I highly recommend you read other papers and source codes for one-shot learning (or few-shot learning) and use the loss function in this repo as you see fit. The evaluation of one-shot learning is different from zero-shot learning.