OpenSenseAction / pypwsqc

Python package for quality control (QC) of data from personal weather stations (PWS)
https://pypwsqc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

Improve FZ filter code #11

Closed cchwala closed 2 months ago

cchwala commented 6 months ago

Some ideas for how to improve the FZ filter code:

cchwala commented 5 months ago

Note that some part of the original algorithm is not correctly implemented, see #22. Hence this issue could be integrated into the work to resolve #22.

lepetersson commented 2 months ago

@cchwala

To be discussed for the FZ filter:

Initialization: ref_array and sensor_array are now initialized as arrays with 0/1 for dry/wet timesteps. This means that periods of "NaN" data become zero, and that periods of NaN har flagged as faulty zeros. Which kind of makes sense, but another alternative is that those time stamps remain NaN?

fz_array is initalized as an array with -1 and overwrites the entries with 0 and 1 where applicable, without considering number of stations reporting rainfall (should be above specified threshold n_stat as is done here (this is mentioned in issue #22 which can be closed)

lepetersson commented 2 months ago

@cchwala about logics of the code:

I am not confident that the current implementation covers all possible cases. We should somehow check if none of the statements in the loop are true in certain cases.

Functionality of FZ-filter, from Lotte's paper: "All stations within a range (d) around a given station are selected to compute the median rainfall over the surrounding area. If fewer than nstat neighboring stations with rainfall measurements are available, the median cannot be calculated and the FZ flag is set to −1. The FZ flag is set to 1 if this median rainfall is larger than zero for at least nint time intervals while the station itself reports zero rainfall. The FZ flag remains 1 until the station reports nonzero rainfall".

I tried to disentagle the logics of the current code and arrived at the following (not sure if this is helpful or just confusing, and I don't know why it becomes an image lol)

image

cchwala commented 2 months ago

Initialization: ref_array and sensor_array are now initialized as arrays with 0/1 for dry/wet timesteps. This means that periods of "NaN" data become zero, and that periods of NaN har flagged as faulty zeros. Which kind of makes sense, but another alternative is that those time stamps remain NaN?

One option would be to use floats, e.g. having [1.0, 0.0, np.Nan, 1.0] in the arrays. I am not saying that this is the best solution. But if we want to be able to distinguish NaN from other values, this works.

Another option is to have a separate array to indicate where there are gaps, i.e. entries with NaN.

cchwala commented 2 months ago

fz_array is initalized as an array with -1 and overwrites the entries with 0 and 1 where applicable, without considering number of stations reporting rainfall (should be above specified threshold n_stat as is done here (this is mentioned in issue https://github.com/OpenSenseAction/pypwsqc/issues/22 which can be closed)

Question: What is documented in #22 still has to be fixed, correct? The problem is what you described here, right?

cchwala commented 2 months ago

I am not confident that the current implementation covers all possible cases. We should somehow check if none of the statements in the loop are true in certain cases.

If I am not mistaken, this is the loop that is referred to

https://github.com/OpenSenseAction/pypwsqc/blob/1371c17ed3715acd74dae222325425829af78960/src/pypwsqc/flagging.py#L53-L66

cchwala commented 2 months ago

@lepetersson Was this fixed in #31? If so, please close this issue.

cchwala commented 2 months ago

as discussed on zulip, I close this issue