OpenSenseAction / pypwsqc

Python package for quality control (QC) of data from personal weather stations (PWS)
https://pypwsqc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

Add HI (high influx) filter from PWSQC #5

Closed cchwala closed 5 months ago

cchwala commented 7 months ago

In the training school notebook for PWSQC we do not have the implementation of the HI (high influx) filter. But Lotte provided a snippet of the R code:

HIflag <- rep(0, times=length(Nint))
   HIflag[which(((Nint > HIthresB) & (Med < HIthresA)) | ((Med >= HIthresA) & (Nint > (HIthresB*Med/HIthresA))))] <- 1  # if thresholds are exceeded, the HI flag becomes 1
   HIflag[which(Number_of_measurements < nstat)] <- -1  # if less than 'nstat' neighbours supply observations, the HI flag becomes equal to -1

   if(exists("HI_flags")==F){ HI_flags <- HIflag
    }else{ HI_flags <- cbind(HI_flags, HIflag) }

Note that this is copy-paste from the Zoom chat, hence, I am not sure about indentation and completeness of this code snippet.

cchwala commented 6 months ago

For reference, the R notebook with all code, including the HI filter, is here.

That might help to understand what the individual variables mean.

lepetersson commented 6 months ago

First draft of HI-filter, not considering compability with the rest of the package;

#set parameters
d = 10000 #meters
nstat = 5 # threshold for nr of stations within range d reporting data for given time step
HIthresA = 0.4 # threshold for median rainfall of stations within range d, mm
HIthresB = 10 #upper rainfall limit, mm

#Initialize HIflag with zeros
HIflag = np.zeros_like(X) #X size of dataset 

def HI_filter(d,nstat, HIthresA,HIthresB):

    for i in np.arange(np.shape(pws_data)[0]):

        #Med = np.median(i,d) median rainfall amount of stations withing range d around station i (same range d for FZ filter?)
        #N = number of stations not-NaN within range d around station i

        #filter cannot be applied if;

        HIflag[np.where(N < nstat)] = -1 

        #rejecting observation if; 
        condition1 = (Med < HIthresA) & (pws_data[:,i] > HIthresB)
        condition2 = (Med >= HIthresA) & (pws_data[:,i] > HIthresB /HIthresA * Med)
        HIflag[np.where(condition1|condition2)] = 1
    return HI_array
cchwala commented 6 months ago

@lepetersson Looks like a good starting point. I might be able to do it without the for-loop, but it is good to start with the for-loop, check that things work (ideally via a test), and then refactor to make it faster.

cchwala commented 5 months ago

closed in #16