Closed scottclowe closed 9 years ago
We've been talking about this, and there are a bunch of ways to deal with it. So, I'm not sure about this but I don't think the idea of a prior probability makes sense in terms of a ConvNet as it's not really a probabilistic model (that only applies to RBMs). We could maybe use Platt scaling to impose a prior on it.
Another way to deal with it would just be to give the ConvNet the information somewhere in the architecture. What we'd really like to have would be a bunch of features like this (which we'd get for free by developing a visual-bag-of-words classifier) for the image that would work fine in just a simple linear classifier and put them in the last hidden layer. Although, who knows if it might work better in the second last or before.
Or, we could take the output of the ConvNet and average it with a classifier working with these kinds of features.
Or, we could just never resize. Stick with a relatively large image size (150x150 maybe). Augment smaller animal's data by moving them around in this big blank space and crop larger images to fit. Then, the ConvNet has to learn about the sizes.
Anyway, it would be interesting to see how well a simple classifier using just image size would work. I'll make an issue for this and describe how it could be done.
OK, made a couple of issues, one in the work repo and one in the tools repo. The work one is about making the submission and the tools one is about making the code. What a great example of why I thought we should separate the two things, right?
I'm going to evolve this issue into something more precise:
We need a way of adding arbitrary data about images into the first or second fully connected (so it is the first MLP not CNN layer) of the network, without it going through the proceeding layers.
Could @matt-graham do this?
Coded up an example YAML file opencv_integration.yaml
that uses a single MLP layer to forward the OpenCV features straight into the fully connected layers, while feeding resized images through the same convolutional architecture we have already been using. Couldn't get past NLL 3.0 in testing. Could require some more careful checking of settings and looking at the monitoring traces to figure out what's going wrong.
The distribution of image sizes is different for each class, because some of the plankton are much larger than others. The size of the image could be used to make a submission by itself.
Since the CNN will have to have its input images resized so they are all the same, it will lose the information about the intrinsic size of the image and its contents. Therefore, we could possibly do better if we combine the CNN predictions with a prior based on the image sizes.