Feature extraction and active learning

chengsoonong / crowdastro

Cross-identification of radio objects and host galaxies by applying machine learning on crowdsourced training labels.

MIT License

13 stars 1 forks source link

Feature extraction and active learning #99

Closed MatthewJA closed 8 years ago

MatthewJA commented 8 years ago

From messing around with the feature extraction step of the pipeline, I've found that the CNN training massively affects the final accuracy. This raises two points:

Should the classifier itself contain the CNN? i.e. should we use an end-to-end CNN rather than pretraining and freezing? CNNs are much slower to train than logistic regression, making them less useful for active learning, but maybe that's okay given the time scale RGZ deals with?
How does pretraining affect active learning tests? Should we pretrain on just (say) 20% of the training data, hope that's good enough, and then use the remaining 80% of the training data for active learning experiments? Training the CNN on all the training data sounds like a bad idea, since our features will then incorporate "unknown" information when doing active learning experiments.

MatthewJA commented 8 years ago

One other idea could be to train the CNN in an unsupervised way, e.g. a CNN autoencoder. This would allow us to train on all the training data without biasing the features.

chengsoonong commented 8 years ago

I suggest training the CNN on ALL the data for now. Document this peeking in your report.

If there is time later in the project, we can consider the following (in order):

end-to-end CNN (warm start should keep training time under control).
CNN autoencoder (perhaps update https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/)

MatthewJA commented 8 years ago

Sounds good. Warm start CNN sounds like it could be a really good approach to take.

The CNN autoencoder you linked looks straightforward, too. I'll add this to milestone C to reconsider then.

MatthewJA commented 8 years ago

Let's revisit this some time, possibly tomorrow?

MatthewJA commented 8 years ago

Radio patches (left) and convolutional autoencoder reconstructions of the patches (right).

chengsoonong commented 8 years ago

Reconstruction a smidgin too smooth, but for our purposes, it looks great.

MatthewJA commented 8 years ago

Great! I'll rerun it a few times to try and nail down a decent network topology — I'd prefer less features than this provides, so I'll probably add another convolutional layer and maybe a dense layer.

MatthewJA commented 8 years ago

I think my boundary conditions break with more convolutional layers, so I'm going to see if I can find another implementation and use my newfound convolutional autoencoder knowledge to get it working on the data.

chengsoonong commented 8 years ago

Before you go down the route of finding features, visualise the IR and radio images of the positive examples that are classified negative by your predictor. 5-10 image patches from:

score approximately -10
score approximately 0

And for comparison, look at 5-10 patches where the score is >5.

At the same time, show the flux values (all other non-image features).

MatthewJA commented 8 years ago

Alright, I'll get that done. #140

MatthewJA commented 8 years ago

If you train logistic regression on the expert labels (100% accurate), you recover 85% balanced accuracy. If you train logistic regression on the crowd majority labels (85% accurate), you recover 85% balanced accuracy, too. This seems interesting! Maybe there's a maximum we're hitting.

I wonder if nonlinear and/or convolutional features would help.