Open bhooshan-supe-gmail opened 4 years ago
BTW, I am software engineer at LG Electronics US.
Hi @bhooshan-supe-gmail Yes. Since the original dataset do not provide the validation set, we split the validation set from the training set.
@layumi I am sorry to be nit picky but you have not split the data-set, but you have some part of training data-set duplicated as validation data-set. On the other hand, I have made sure that in my data-set training and validation data-set are completely disjoint sets. And the side effect of that is my training and validation curves are not converging. Please refer following image.
So I am wondering is this OK? Is this training reliable?
Hi @bhooshan-supe-gmail
Please check this line https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111
There are no-overlapping images between the training and validation set.
If you use train-all
, there will be the overlapping images.
I do not know how you split the dataset. Actually, there are two ways to split the dataset.
Classification
style.Retrieval
style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen
by the model, the model could not classify the images of the 100 classes.Hi @layumi
To be honest I am quite new to computer-vision and machine learning. Thanks a lot for your guidance!
Hi @layumi ,
We have our own but very small data-set (about 21 person-ids but about 1500 images). And I am fine tuning on your model using our data-set. Basically we are looking into how we can re-identify person from almost top-view (from a very steep angle) instead of side and/or front view.
@bhooshan-supe-gmail You may start from my tutorial, which is more straight forward https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial
And recently I release a dataset and code for satellite-view, drone-view, ground-view geo-localization.
You are welcomed to check out it. https://github.com/layumi/University1652-Baseline
Another way is in Retrieval style. Given 751 classes in the Market-1501 dataset, we split the first 651 classes as training set and leave out the 100 classes as validation set. We could use the images of 100 classes as query and gallery to evaluate the retrieval performance. However, since 100 classes have not been seen by the model, the model could not classify the images of the 100 classes.
How would you go about adding this retrieval style evaluation? does it make sense here to add retrieval style evaluation in addition to classification evaluation which makes the model to classify images to person/object ids?
Hi @nikky4D Sorry. What is 00 classes? Could you provide more descriptions?
Sorry, I quoted it incorrectly, please see edited comment above
Hi @nikky4D
Validation (Classification Setting) I write it with the training code. You do not need to modify the split.
Validation (Retrieval Setting) If you want to evaluate on 651 / 100 split (751 ID in total), you need to modified the prepare data to split it. Since the id is random, I simply use the first 651 ID as train and late 100 ID as val. For validation on retrieval, you need to use the test.py to test the validation like the test setting. (The validation result during the training is not correct. )
Thank you. Then for the teacher training, is it better to use the retrieval split or classification setting for a more robust dg-net setup or does the dataset setup not matter in the final model?
Hi Xiaodong Yang, Zhedong Zheng,
I am planning to use your model in one of our experimental project as base model for transfer learning. And while studying I have noticed that your "val" (validation) dtat-set is subset of "train" (training) data-set. (Refer https://github.com/NVlabs/DG-Net/blob/master/prepare-market.py#L111)
And I believe that it is quite against my understanding. So kindly explain why you have decided to have " validation data-set as subset of training data-set" ?