Data format - Githubissues

jplu commented 6 years ago

Hello ! It is not clear what should be the data to be processed?

The original file used in the paper is here so now, once I have downloaded it, what should I do with?

Thanks for adding some clarification in the data preprocessing :)

akarshzingade commented 6 years ago

Hey, Julien! Once you have downloaded the images, you would need to the triplets to run the model. So, you would need to create the triplet file and then feed the images and the triplet text file to the model. Hope this helps. If you have further questions, please let me know. :)

jplu commented 6 years ago

Thanks! Unfortunately after trying multiple ways, I still don't see what I have to do with the file that can be downloaded here. Can you give more step by step details please.

I did download all the images listed in the original dataset and put all of them into the same directory with the following logic: each triple has an id, the query image is labelled "1", the positive image is labelled "2" and the negative image is labelled "3". Which gives for example for the 4 first line in the dataset:

1st line, label: nothing
2nd line, 1st image (query image): 1_1.jpg
3rd line, 2nd image (positive image): 1_2.jpg
4th line, 3rd image (negative image): 1_3.jpg

Is it the right preprocessing? If not what is the preprocessing that I should do with the original dataset?

Thanks a lot for the help :)

jplu commented 6 years ago

Ok, after long hours of tweeking, I finally got what you are doing :)

Might be good to specify that as input you are waiting a directory containing subdirectories where each of these subdirectories represent a class that contains all the images of that specific class. And that you are not using at all the original dataset specified in the paper as it is.

Thanks a lots for your help :)

akarshzingade / image-similarity-deep-ranking

Data format #14