Now that we are training a sample with all offset angles I played around with the different models to see which one performs best. The plots below are all using 25% of the sample as training sample. A separate study will look into reducing or increasing that fraction. Out of the many models tested, I selected a few and produced the summary plots below in order to justify the final choice.
First we start with various hyperparameter options using random forest. The first number in the title is the number of trees and the second is the tree maximum depth. This set of forests is after some optimisation, the original set of hyperparameters performed significantly worse. So in the plot below we can see that we more or less reached an optimum and regardless of which set of parameters we choose (except the 2000/5 forest).
Therefore, in the next comparison we chose the 300/15 one since it is the smallest (also in disk space).
Next we compare the random forest performance to the various MLP options.
Again, these models are after a first round of optimisation. Generally it is clear that either of the the three MLP_tanh, MLP_logistic, MLP_uniform would be good. They all provide equivalent performance. I suggest we keep MLP_tanh for the future just because it is the one we used in the past. All future tests should be performed with it.
Now that we are training a sample with all offset angles I played around with the different models to see which one performs best. The plots below are all using 25% of the sample as training sample. A separate study will look into reducing or increasing that fraction. Out of the many models tested, I selected a few and produced the summary plots below in order to justify the final choice.
First we start with various hyperparameter options using random forest. The first number in the title is the number of trees and the second is the tree maximum depth. This set of forests is after some optimisation, the original set of hyperparameters performed significantly worse. So in the plot below we can see that we more or less reached an optimum and regardless of which set of parameters we choose (except the 2000/5 forest).
Therefore, in the next comparison we chose the 300/15 one since it is the smallest (also in disk space). Next we compare the random forest performance to the various MLP options.
Again, these models are after a first round of optimisation. Generally it is clear that either of the the three MLP_tanh, MLP_logistic, MLP_uniform would be good. They all provide equivalent performance. I suggest we keep MLP_tanh for the future just because it is the one we used in the past. All future tests should be performed with it.