KIT-HYD / bridget

Evapotranspiration toolbox
https://KIT-HYD.github.io/bridget
0 stars 0 forks source link

Replace NaN values #4

Closed mmaelicke closed 4 years ago

mmaelicke commented 4 years ago

Problem

Our Test data seems to contain numeric substitutes for NaN values. E.g. with one of the testfiles:

from bridget.io import read_TK2_file
df = read_TK2_file('./data/Fendt_TK2_result_2014.csv')
df.CO2.where(df.CO2 < 0).dropna()

yields: image

and the plot of the time series looks like:

image

Suggestion

We should verify, that this is actually a NaN (@sihassl ?) and check the other files and variables for other values (@AlexDo1 ). Is it always the same? Then we need to implement this as a default NaN value option to convert it properly to numpy.NaN.

mmaelicke commented 4 years ago

Could confirm these are NaNs. @AlexDo1, you can now check the other columns and files, if always the exact value is used.

AlexDo1 commented 4 years ago

Okay I'll do that and I guess that I will also replace the -9999 values with numpy.NaN values.

AlexDo1 commented 4 years ago

It looks like -9999.9003906 is unique for these NaN values (at least in the column CO2), but there are different negative values for CO2, which should be wrong, but I guess these values shouldn´t be replaced with NaN.

image

mmaelicke commented 4 years ago

Yes, following the header-file the unit is [mmol/m³]. I would guess this is not a concentration, but a flux. So, + and - is direction, which I would guess is upward (+) and downward (-).

Just to be clear: The -9999.9003906 should not be replaced in the sample files, but rather set in the read_*_file function as a default value for nan like:

def read_TK2_file(fname, strict=True, na_values=-9999.9003906):
    [...]

Then passed to the pandas.read_csv function. The correct argument name shoudl be na_values can be found in their docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

AlexDo1 commented 4 years ago

Okay, I did that on the sample data :/ But at least the plotted data looks good when replacing -9999.9003906 with NaN.

I´ll look into the read_TK2_file function now.

mmaelicke commented 4 years ago

I think the only issue we could run into is, when a whole coulmn is NaN only. Then it's removed during data import and the merging with default column names will not work anymore, because of shape mismatch. We should test this....