Train/val/test split mismatch

google-deepmind / leo

Implementation of Meta-Learning with Latent Embedding Optimization

Apache License 2.0

305 stars 57 forks source link

Train/val/test split mismatch #8

Closed silverbottlep closed 5 years ago

silverbottlep commented 5 years ago

Hi, thanks for the fantastic work! I looked at the embeddings and It seems that train/val/test split is different than original one (https://github.com/twitter/meta-learning-lstm). Is it a mistake or intended?

sygi commented 5 years ago

Hello, thank you for your question. I have quickly skimmed the repo and looked at test.csv file and it looks like it contains the same classes as the ones we used in our split. I also looked a little at the exact ids of the files from one of the classes and they also look similar (on a first look).

Are you able to tell me why do you think the split is different? Can you give an example of the file/class that is included in one of our/original splits?

Thank you

silverbottlep commented 5 years ago

Thanks for your response, I just checked again, test split is same but train/val splits are different (I might be doing something wrong, so please double check!)

leo validation split classes n01558993 n01910747 n02074367 n02091831 n02101006 n02114548 n02165456 n02606052 n02687172 n02971356 n03908618 n04243546 n04389033 n04604644 n07697537 n13133613

original validation split classes n01855672 n02091244 n02114548 n02138441 n02174001 n02950826 n02971356 n02981792 n03075370 n03417042 n03535780 n03584254 n03770439 n03773504 n03980874 n09256479

At the end of the day, we evaluate on test set, so it should be okay. But, from my experience validation errors are quite different, and we do hyperparameter search on validation set, and it might give us different results, considering small size dataset like mini-imagenet.

andreirusu commented 5 years ago

Any train/valid split will do, there is no "correct" one. Since previous SOTA has trained their model on train+valid, as they should, we also used both splits for final results. Which split is used for hyper-parameters choices is less important. In fact, we should have used a leave-16-out cross validation regime and should have evaluated 5 grids to find the best setting of hypers across all 5 blocks of 16 classes as validation. Naturally, that is more expensive, but it would have found better hyper-param, at least in theory. Short of that, any split will do.