ecmwf / ecpoint-calibrate

Interactive GUI (developed in Python) for calibration and conditional verification of numerical weather prediction model outputs.
GNU General Public License v3.0
21 stars 8 forks source link

Remove NaN values from the variables before working with them #141

Open EstiGascon opened 3 years ago

EstiGascon commented 3 years ago

It can happen that some variable contains some NaN (or 9999) values for a particular reason, and it would be better to remove them from the database before computing the mapping functions or run the K-S test. To do that, I suggest that when we are checking a specific variable (predictor) in the decision tree, the software removes the corresponding lines with NaNs or 9999 (or -9999) for that variable and the FER. It is important that it only removes the values for this variable and not the rest of the variables.

onyb commented 3 years ago

Can you please specify where you see NaN values? Normally, when you upload a breakpoints CSV, the software will ask you to enter the identifier to use for representing "infinity" value. "NaN" is something I've never encountered, so it could be a bug somewhere.

Here's the screenshot asking the user to enter the infinity value => 9999, inf, etc:

Screenshot 2020-12-20 at 23 10 38
EstiGascon commented 3 years ago

The issue is not in the breakpoints csv database if not in the ascii or parquet file with the FE/FER and predictor values for each observation point. This is because in the previous step, the nearest grid point for some observation points in a specific predictor, gave some NaNs (for example if the predictor has values only in the continent and not in the sea, but the nearest gridpoint is some grid box in the sea). So what we could do is to check if we have some Nan, 9999 or -9999 in the corresponding column of the predictor in the ascii or parquet file, before starting to calculate the breakpoints for that predictor and remove those lines for that specific predictor and its corresponding FE/FER. It means, avoid to include in the K-S test the NaN or 9999 values.

FatimaPillosu commented 3 years ago

@EstiGascon can you specify what was the value for your NaN? Was it a NaN, or a specific numerical value, etc?

EstiGascon commented 3 years ago

I think that it is the standard missing value in Metview, which corresponds to 3e+38

FatimaPillosu commented 3 years ago

I have changed this issue from "enhancement" to "bug" as having NaN in the point data table and keep them in the creation of the decision tree can create unexpected issues. For the same reason, I have also moved the issue to priority 2.