ardiya / siamesenetwork-tensorflow

Using siamese network to do dimensionality reduction and similar image retrieval
MIT License
282 stars 83 forks source link

Need suggestions #4

Open ArkaJU opened 6 years ago

ArkaJU commented 6 years ago

I have written the code for a similar problem involving RGB images on a broader dataset but the results are not that good. The repo link is given below:

https://github.com/ArkaJU/Sketch-Retrieval---Siamese

Any help will be gratefully appreciated.

ardiya commented 6 years ago

Could you elaborate what are "not that good"?

Anyway, I found some weird stuff in your code:

  1. Layer fc1 doesn't actually contain fully connected layer
  2. You show the result after 100 iteration... you have a lot of data, print it after each epoch(and it seems you have a different definition of epoch)
  3. Weird kernel size... Usually it's 7x7 then it keep decreasing.. Well, I'm not really up to date so I might be wrong for this one.

I also don't seem to understand the way you create a batch, what's the goal behind it?

ArkaJU commented 6 years ago

By 'not that good' I mean that for a query sketch of say pizza, the top 5 images returned seem to be from random classes (knife, giraffe, lion etc). At times a pizza pic pops up but I guess that is more by chance that a result of training.

  1. I just flattened the last layer obtained after the conv6 layer and put it in fc1 scope, then added the fc2 layer on top of it. Do i need to actually add a fully connected layer in fc1 scope after flattening(was not sure so didn't)?

  2. Yeah, here dataset can't be stored in memory since there is a huge number of sketch-image pair combinations possible so storing in pair form just blows up. So basically I had to select randomly the paths of the dataset and if the sketch and image belong to the same class, label is set to 1 otherwise 0 (If you are not clear about the dataset structure, I can elaborate) . So by epoch I mean create batches of 128 image-sketch-label triplets and train with them. The results I print are at 100 epoch frequency (100, 200, 300..). Printing it after each epoch is just a bit more time consuming, had done it earlier.

  3. The architecture is borrowed from this paper: https://ieeexplore.ieee.org/document/7532801

Batches are created for training the siamese network, one CNN for sketch and another for image. The loss function being the contrastive loss. Something wrong with it?

ardiya commented 6 years ago

If that's the case, are you sure that the model already converge?

  1. Well, it depends on the dataset, so you need to try. Usually either 1 fc or 2 fc layers
  2. OK.
  3. The sub-sampling looks like either max-pooling or avg-pooling. It's not a conv layer with 2x2 and stride 2 layer. Another conv layer makes it harder to converge, unless you use residual layer.

In my code, I tried to balance the proportion of the similar classes and dissimilar classes. Then try with the old good GradientDescentOptimizer; I don't have a good time using AdamOptimizer during my experiment. A lot of loss function can have similar properties with contrastive loss, such as triplet loss and center loss. You can also try those for comparison.

ArkaJU commented 6 years ago

Will try for more number of epochs, let's see.

  1. Fine, gonna try both.
  2. This was a bad mistake on my part, this might be the reason of non-convergence.

Okay, gonna try different optimizers for same query sketch. I have also tried too balance the similarity of the classes by choosing same class/different class with probability 0.5, seems to work fine on dry running. Any other fishy thing you found in my code? I have updated the architecture, please have a look. Is the pairing mechanism fine?