Feeding CNN results into a FGVC pipeline

fmaguire commented 9 years ago

So if we accept for the sake of argument the idea that fine-grained-visual-categorisation is a different task to generic object recognition and needs different algorithms as in this Diettrich paper.

CNNs seem to be SoA for object recognition in natural images but FGVC might be better for this sort of task (Chris believes so, Diettrich's work is very similar problem and focusses on this kind of algorithm).

What if we try to combine these approaches:

train a classical CNN on the dataset
cluster the CNN output matrices for the images
apply FGVC methods to each of these subsets of images e.g. descriptor + dictionary + LLC encoding + max pooling + linear SVM (as a similar task identifying insects found this performed better than dictionary-free SET methods poster)

This sort of divide and conquer approach might be awful but intuitively it seems a reasonable thing to at least try and optimistically might combine the best of both worlds. We could also consider modifying the class labels if obvious superclasses (maybe even using those provided by kaggle) emerge at this clustering step.

gngdb commented 9 years ago

Would also work with an autoencoder.

scottclowe commented 9 years ago

I made an issue for setting up an FGVC model with Stacked Evidence Trees. neuroglycerin/neukrill-net-tools#76

Neuroglycerin / neukrill-net-work

Feeding CNN results into a FGVC pipeline #18