NLeSC / team-atlas

1 stars 0 forks source link

ML: proof of concept #161

Closed yifatdzigan closed 3 years ago

rogerkuou commented 3 years ago

Xu has trained ~2000 grid cells with ml_lsmodel_ascat. Several issues have been discovered:

rogerkuou commented 3 years ago

@sonjageorgievska do you have time to add some comments to this one?

sonjageorgievska commented 3 years ago

We see the prediction data is flat on the peaks/valleys comparing to observations. This indicates poor training quality. To understand the cause behind this, the following up task can be: 1) #168 try to reproduce Manuel's results and 2) #169 come up with ideas to reduce oscillation in the prediction.

Reducing the oscillation will not solve the problem with the flat peaks. The paper on calibration, mentioned in #169, however might help with both reducing the oscillations ("calibration") and solving the problem with flat peaks. This is because it proposes binning the output values to perform multilabel classification instead of regression. In this case, one can "balance" the data in the bins by oversampling, and thus the input data for the peaks will not be underrepresented. So, two in one ;)

sonjageorgievska commented 3 years ago

We need to gain more understand of the input data. This includes the following tasks: 1) #170 understand why some input data has so high frequency signal; 2) #166 understand why two of the input time-series has the same time series with different scale; and 3) #163 understand the relations between the inputs parameters

1) and 2) are solved ( I think I closed those issues?); 3) is something that takes time but luckily is not the most urgent thing

sonjageorgievska commented 3 years ago

The previous comments are not directly related to the proof-of-concept, but it is good to have them discussed.