ML: proof of concept - Githubissues

rogerkuou commented 3 years ago

Xu has trained ~2000 grid cells with ml_lsmodel_ascat. Several issues have been discovered:

We found the performance of curvature much worse than the other two out puts. One assumption is the LS model parameters which influence curvature has either too high or too low frequency. We would like to investigate the influence of curvature by adding loss weight to the three outputs. See #160
We see the prediction data is flat on the peaks/valleys comparing to observations. This indicates poor training quality. To understand the cause behind this, the following up task can be: 1) #168 try to reproduce Manuel's results and 2) #169 come up with ideas to reduce oscillation in the prediction.
We need to gain more understand of the input data. This includes the following tasks: 1) #170 understand why some input data has so high frequency signal; 2) #166 understand why two of the input time-series has the same time series with different scale; and 3) #163 understand the relations between the inputs parameters

rogerkuou commented 3 years ago

@sonjageorgievska do you have time to add some comments to this one?

sonjageorgievska commented 3 years ago

We see the prediction data is flat on the peaks/valleys comparing to observations. This indicates poor training quality. To understand the cause behind this, the following up task can be: 1) #168 try to reproduce Manuel's results and 2) #169 come up with ideas to reduce oscillation in the prediction.

Reducing the oscillation will not solve the problem with the flat peaks. The paper on calibration, mentioned in #169, however might help with both reducing the oscillations ("calibration") and solving the problem with flat peaks. This is because it proposes binning the output values to perform multilabel classification instead of regression. In this case, one can "balance" the data in the bins by oversampling, and thus the input data for the peaks will not be underrepresented. So, two in one ;)

sonjageorgievska commented 3 years ago

We need to gain more understand of the input data. This includes the following tasks: 1) #170 understand why some input data has so high frequency signal; 2) #166 understand why two of the input time-series has the same time series with different scale; and 3) #163 understand the relations between the inputs parameters

1) and 2) are solved ( I think I closed those issues?); 3) is something that takes time but luckily is not the most urgent thing

sonjageorgievska commented 3 years ago

The previous comments are not directly related to the proof-of-concept, but it is good to have them discussed.

NLeSC / team-atlas

ML: proof of concept #161