OpenSenseAction / pypwsqc

Python package for quality control (QC) of data from personal weather stations (PWS)
https://pypwsqc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

first implementation of faulty zero filter fz_filter #3

Closed cchwala closed 6 months ago

cchwala commented 7 months ago

This will close #2

TODOs from WIP state from last week (copy-paste from comment below):

some TODOs

cchwala commented 7 months ago

Since test are not running (UPDATE: test now run, see below, and codecov shows that these lines are note covered), because linting already fails, I put a quick note about test coverage here before I change parts of the code. Only L23 and L25 are not covered by tests.

https://github.com/OpenSenseAction/pypwsqc/blob/110c790ccc95fdc5baedc244d79dd62ef6ddaf1a/src/pypwsqc/fz_filter.py#L22-L25

I am not sure what the purpose of the np.ma.compressed ... = 0 things is. I do not see masked arrays being used above this code. Maybe there where some masked arrays in the code where the FZ_filter function originally came from.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (58dffcf) 100.00% compared to head (1c7fbf3) 100.00%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 1 2 +1 Lines 3 23 +20 ========================================= + Hits 3 23 +20 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cchwala commented 7 months ago

EDIT: copied TODOs to comment at top

LottedeVos commented 7 months ago

Since test are not running (UPDATE: test now run, see below, and codecov shows that these lines are note covered), because linting already fails, I put a quick note about test coverage here before I change parts of the code. Only L23 and L25 are not covered by tests.

https://github.com/OpenSenseAction/pypwsqc/blob/110c790ccc95fdc5baedc244d79dd62ef6ddaf1a/src/pypwsqc/fz_filter.py#L22-L25

I am not sure what the purpose of the np.ma.compressed ... = 0 things is. I do not see masked arrays being used above this code. Maybe there where some masked arrays in the code where the FZ_filter function originally came from.

Honestly I'm not quite sure, I think Jochen will be able to provide more detail as it is his translation of the code. Based on what I could quickly google: I suspect is has to do with that FZ_array is first defined as an array of '-1' values. If the Ref_array does not contain a value, the FZ_array should remain -1. If Ref_array does contain a value but the Sensor_array does, the FZ_array receives the value that it had on the previous interval.

Basically: if you don't have a reference, you cannot attribute a FZ-flag (i.e. FZ flag is -1). If you don't have a value in your PWS time series, the FZ flag of the previous time interval is attributed.

I hope this helps!

cchwala commented 7 months ago

Thanks. That helps to understand the logic.

But the part with np.ma.compressed is very specific because it only does something useful if e.g. Ref_array is a "masked" numpy array.

I did some test to check the behavior of np.ma.compressed:

Bildschirmfoto 2024-01-24 um 11 04 41

This might stem from the usage of masked arrays, e.g. from radar data, that have been used with the Python code that we use here. Based on my current understanding of the code, I would just remove the two statements with np.ma.compressed.

@JochenSeidel Do you have any additional info?

UPDATE: Based on info on Zulip there is no reason for us to keep the masked array if-statements

cchwala commented 7 months ago

The Linux build fails on CI, but only for Python 3.10... 😬

For whatever reason on Python 3.10 pip gets the source distribution while it gets the wheels on Python 3.11, see this screenshot with a comparison of both CI runs:

Bildschirmfoto 2024-01-25 um 09 00 33

The problem is that building netcdf4 from source requires the C libraries of HDF5 and NetCDF to be there. This building process should be avoided, and this is actually what pip does on Python 3.11. I have not yet found a way to solve that...

cchwala commented 7 months ago

Note that the weels for Python 3.10 are there on similarly to the wheels for Python 3.11 pypi https://pypi.org/project/netCDF4/#files

drawing
cchwala commented 7 months ago

...It was not a Linux Python3.10 problem, but a problem with PyPy for which there was a dedciated CI run on Linux with Python3.10 ...🤦‍♂️

Problem solved by removing PyPy from CI