Harry-Westwood / Y4-Project-InterNeuralStellar

Bayesian Hierarchical Modelling and Machine Learning of Stellar Populations
1 stars 0 forks source link

Signs of overfitting? #13

Closed HinLeung622 closed 4 years ago

HinLeung622 commented 4 years ago

@grd349 I am messing with a made up 2D mapping function that converts 2 inputs into 2 outputs, and trying to get a NN to learn that. To simulate mesa tracks, the input looks like this: image And output: image I first did 4 hidden layers with 16 neurons each, and with 100010lines of data, and after 20k epoch, got this resulting plot: image blue plots the exact points used in training, but predicted by the NN, the background gray is the true outputs, and orange is the NN outputs for i have shifted the x1 inputs by +0.5(distance between consecutive lines). Although the blue lines are not of a very accurate estimate of the true outputs, the top orange line does appear ~half interval distance above the top blue line, which seems correct. I thought it would do better if I expanded the architecture, so I made a NN that has also 4 layers but 64 neurons each, and this is its result after 20k epochs: image What is interesting is that, although this larger NN is able to get the blue lines more correct, the top orange line is now partially under the top blue line. and the overall evaluation loss score went up for a line that is within the input field (0 to 1 for both x1 and x2).

Question: is this all a sign of overfitting? notebook is here: https://github.com/Harry-Westwood/Fourth-Year-Project/blob/master/Hin's_files/neural_network_test/TestNN_2D.ipynb

grd349 commented 4 years ago

Hi @HinLeung622

Good work - this looks good and you have valid concerns which have produced a good question!

Underfitting and overfitting are terms that are more general than for just NN's. I think the easiest way to think about this is fitting a polynomial of n order to some data.

Underfitting - where n is too low and the polynomial cannot adjust to fit the data what ever it does. (this is a bit simplistic but ...)

Overfitting - where n is too high and the polynomial can adjust to fit the data but does not generalise well. I.e., for points in between your data the fit has more structure than it should do. See https://en.wikipedia.org/wiki/Overfitting

For us, overfitting is a real danger. We will be able to spot underfitting easily as you did in the first result you got. Over fitting is harder to spot. Have you looked at the train and validation metrics?

Also - did you shuffle your data before you trained on it?

We can discuss on Monday or I have 10 mins around 12:30 today if you can make it.

HinLeung622 commented 4 years ago

@grd349 apologies I only just got up haha... So 12:30 today is quite out of question.

The final training loss was lower for the larger NN, while the validation loss was lower for the smaller NN. I believe this is a sign of overfitting.

Yes I did shuffle the data beforehand.

HinLeung622 commented 4 years ago

got better understanding of the terms overfitting and underfitting after general discussion, closing this issue