Why the "val" data-set is subset of "train" data-set?

NVlabs / DG-Net

:couple: Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral) :couple:

https://www.zdzheng.xyz/publication/Joint-di2019

Other

1.27k stars 230 forks source link

Why the "val" data-set is subset of "train" data-set? #36

Open bhooshan-supe-gmail opened 4 years ago

bhooshan-supe-gmail commented 4 years ago

Hi Xiaodong Yang, Zhedong Zheng,

I am planning to use your model in one of our experimental project as base model for transfer learning. And while studying I have noticed that your "val" (validation) dtat-set is subset of "train" (training) data-set. (Refer https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111)

And I believe that it is quite against my understanding. So kindly explain why you have decided to have " validation data-set as subset of training data-set" ?

bhooshan-supe-gmail commented 4 years ago

BTW, I am software engineer at LG Electronics US.

layumi commented 4 years ago

Hi @bhooshan-supe-gmail Yes. Since the original dataset do not provide the validation set, we split the validation set from the training set.

bhooshan-supe-gmail commented 4 years ago

@layumi I am sorry to be nit picky but you have not split the data-set, but you have some part of training data-set duplicated as validation data-set. On the other hand, I have made sure that in my data-set training and validation data-set are completely disjoint sets. And the side effect of that is my training and validation curves are not converging. Please refer following image. train

So I am wondering is this OK? Is this training reliable?

layumi commented 4 years ago

Hi @bhooshan-supe-gmail

Please check this line https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111 There are no-overlapping images between the training and validation set. If you use train-all, there will be the overlapping images.
I do not know how you split the dataset. Actually, there are two ways to split the dataset.
- One easy way is as shown in above. We select the first image of every class in the training set as the validation set. We evaluate the performance in a Classification style.

Another way is in Retrieval style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen by the model, the model could not classify the images of the 100 classes.

bhooshan-supe-gmail commented 4 years ago

Hi @layumi

To be honest I am quite new to computer-vision and machine learning. Thanks a lot for your guidance!

bhooshan-supe-gmail commented 4 years ago

Hi @layumi ,

We have our own but very small data-set (about 21 person-ids but about 1500 images). And I am fine tuning on your model using our data-set. Basically we are looking into how we can re-identify person from almost top-view (from a very steep angle) instead of side and/or front view.

layumi commented 4 years ago

@bhooshan-supe-gmail You may start from my tutorial, which is more straight forward https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial

And recently I release a dataset and code for satellite-view, drone-view, ground-view geo-localization.
You are welcomed to check out it. https://github.com/layumi/University1652-Baseline

nikky4D commented 2 years ago

Another way is in Retrieval style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen by the model, the model could not classify the images of the 100 classes.

How would you go about adding this retrieval style evaluation? does it make sense here to add retrieval style evaluation in addition to classification evaluation which makes the model to classify images to person/object ids?

layumi commented 2 years ago

Hi @nikky4D Sorry. What is 00 classes? Could you provide more descriptions?

nikky4D commented 2 years ago

Sorry, I quoted it incorrectly, please see edited comment above

layumi commented 2 years ago

Hi @nikky4D

Validation (Classification Setting) I write it with the training code. You do not need to modify the split.
Validation (Retrieval Setting) If you want to evaluate on 651 / 100 split (751 ID in total), you need to modified the prepare data to split it. Since the id is random, I simply use the first 651 ID as train and late 100 ID as val. For validation on retrieval, you need to use the test.py to test the validation like the test setting. (The validation result during the training is not correct. )

nikky4D commented 2 years ago

Thank you. Then for the teacher training, is it better to use the retrieval split or classification setting for a more robust dg-net setup or does the dataset setup not matter in the final model?