Open geoffwoollard opened 5 years ago
Looks like the CNN can distinguish ligand present / absent!
It needs more than one epoch. I got the fit_generator
working, so can fit to lots of data.
As of now:
Epoch 1/9
760/760 [==============================] - 2399s 3s/step - loss: 0.6723 - categorical_accuracy: 0.5830
Epoch 2/9
760/760 [==============================] - 2445s 3s/step - loss: 0.6082 - categorical_accuracy: 0.6700
Epoch 3/9
760/760 [==============================] - 2276s 3s/step - loss: 0.4988 - categorical_accuracy: 0.7557
Epoch 4/9
760/760 [==============================] - 2258s 3s/step - loss: 0.3935 - categorical_accuracy: 0.8215
Epoch 5/9
760/760 [==============================] - 2252s 3s/step - loss: 0.3060 - categorical_accuracy: 0.8687
Epoch 6/9
760/760 [==============================] - 2243s 3s/step - loss: 0.2506 - categorical_accuracy: 0.8969
Epoch 7/9
760/760 [==============================] - 2243s 3s/step - loss: 0.2005 - categorical_accuracy: 0.9196
Epoch 8/9
760/760 [==============================] - 26504s 35s/step - loss: 0.1640 - categorical_accuracy: 0.9340
Epoch 9/9
760/760 [==============================] - 2250s 3s/step - loss: 0.1394 - categorical_accuracy: 0.9475
Note that each epoch is 95% of 20k particles (760 mini batches of 25 particles). 5% are saved for validation
Problems with over fitting
After 10 epochs with loss: 0.1394 - categorical_accuracy: 0.9475
:
1000/1000 [==============================] - 40s 40ms/step
categorical_accuracy: 54.50%
Epoch 1/1
760/760 [==============================] - 2237s 3s/step - loss: 0.1129 - categorical_accuracy: 0.9577
1000/1000 [==============================] - 40s 40ms/step
categorical_accuracy: 87.00%
Epoch 1/1
760/760 [==============================] - 2358s 3s/step - loss: 0.0987 - categorical_accuracy: 0.9627
1000/1000 [==============================] - 42s 42ms/step
categorical_accuracy: 89.60%
Epoch 1/1
760/760 [==============================] - 2426s 3s/step - loss: 0.0845 - categorical_accuracy: 0.9684
1000/1000 [==============================] - 45s 45ms/step
categorical_accuracy: 55.30%
See how the test accuracy is going from high 80s to mid 50s. There are only 1000 particles in the test set, so try with a bit more and do testing after each epoch.
The model can distinguish a blurry map from a crisp one, even when the maps are randomly oriented with noise.
Now we'd like to know how similar the maps can be, and how noisy the 2D projections can be. At some point the maps will be too similar and the noise will be too high, and the model will not be able to classify accurately.