Range thresholding moved to earlier in L0toL1 processing

PennyHow commented 7 months ago

Bug found where z_boom_u/z_boom_l measurements are jumping where t_u/t_l measurements are below -50.

This is because of an error in the lufft sensor that reports bad readings below -50. Here is an example with station HUM:

In [1]: from pypromice.process import AWS
In [2]: aws = AWS('aws-l0/tx/config/HUM.toml', 'aws-l0/tx/')
In [3]: aws.process()

# Level 0 z_boom and t data (before processing, with bad temperature data)

In [4]: list(aws.L0[0]['z_boom_u'].sel(time='2022-11-09').values)
Out[4]: 
[3.965, 3.972, 3.974, 3.977, 3.984, 3.987, 3.991, 3.995,
 3.995, 3.987, 3.982, 3.984, 3.979, 3.978, 3.957, 3.95,
 3.937, 3.937, 3.927, 3.924, 3.92, 3.914, 3.916, 3.919]

In [5]: list(aws.L0[0]['t_u'].sel(time='2022-11-09').values)
Out[5]: 
[-49.0, 650.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 825.0, 
-49.95, 125.0, 825.0, 300.7, -48.58, -49.28, -48.55, -47.2, 
-46.22, -45.37, -44.85, -44.15, -43.7, -43.23, -43.02, -42.93]

# Level 1 z_boom data (after processing, with erroneous jumps)

In [6]: list(aws.L1[0]['z_boom_u'].sel(time='2022-11-09').values)
Out[6]: 
[7.289180861917533, 8.57528227190415, 8.579600138103498, 8.586076937402519, 
 8.601189469100236, 8.607666268399257, 8.002247110305188,
 3.6112975758332504, 4.823247426546757, 7.994226817536152, 5.7716481286699075,
 3.6123897126299003, 3.602228735667318, 3.6071902823546296, 3.5989152582748902,
 3.600331164726147, 3.5951962880115276, 3.599297693561251, 3.595655208111283,
 3.596436753864435, 3.5964484525318636, 3.592583224465607, 3.5951217748425828,
 3.599047862357614]

I found a simple solution for this is to run the range thresholding QC at the beginning of pypromice.process.L0toL1. Right now, this is only done at the end of pypromice.process.L0toL1 AFTER z_boom_u/z_boom_l is corrected for with t_u/t_l.

Instead, we should perform the range threshold at the beginning of pypromice.process.L0toL1. This removes errors in the temperature data before applying them to correct the boom height.

In [7]: list(aws.L1[0]['z_boom_u'].sel(time='2022-11-09').values)
Out[7]: 
[nan, nan, nan, nan, nan, nan, nan, 3.6112975758332504, nan, nan, nan, 3.6123897126299003,
 3.602228735667318, 3.6071902823546296, 3.5989152582748902, 3.600331164726147,
 3.5951962880115276, 3.599297693561251, 3.595655208111283, 3.596436753864435,
 3.5964484525318636, 3.592583224465607, 3.5951217748425828, 3.599047862357614]

BaptisteVandecrux commented 7 months ago

Nice catch!

My main concern is that good data could be filtered out before we have a chance of fixing it, or more worrying, before it is transformed to the units that are stated in the variable.csv. usr and dsr are an example of this.

Since there is no calibration or unit conversion on temperature, and I have never seen that variable need adjustment (unlike pressure for example), it would be safest to only clip temperature variables.

And as a note for later, we need to check how often these air temp is missing while the SR50 are still working. If that happens frequently, maybe we want to gap-fill the air temp before it is used to correct the SR50 measurements. But that's more enhancement.

PennyHow commented 7 months ago

Since there is no calibration or unit conversion on temperature, and I have never seen that variable need adjustment (unlike pressure for example), it would be safest to only clip temperature variables.

And as a note for later, we need to check how often these air temp is missing while the SR50 are still working. If that happens frequently, maybe we want to gap-fill the air temp before it is used to correct the SR50 measurements. But that's more enhancement.

I had spoken to @ladsmund about doing some form of interpolation for the temperature data so that we avoid losing valid SR50 measurements. If there is concern about clipping all variables too early in L0toL1 then we could make a routine that only clips temperature data... then we could also add an interpolation routine, either now or later in the future. I'll make these changes now.

PennyHow commented 7 months ago

I've moved the temperature correction to a separate routine. And we only use it for clipping and interpolating t_u/t_l for correcting z_boom_u/z_boom_l. Therefore, the clipped and interpolated temperature data is not carried forward in t_u/t_l

The clipping is essentially lifted from the pypromice.process.value_clipping routine, but only clips temperature data.

I think it is a good idea to do the interpolation in order to preserve as many valid z_boom_u/z_boom_l values as possible. I've chosen a maximum window of 12 hours then should be interpolated across... if you think this should be smaller or bigger then I am open to suggestions. I chose a temporal window rather than a number of time steps to account for the different temporal resolutions of the raw (10 minutes) and tx (hourly) data.

The output is a t_u_interp/t_l_interp variable which is retained through the processing routine, but dropped at the end when making the netcdf/csv output file. Therefore we can inspect it if needed.

GEUS-Glaciology-and-Climate / pypromice

Range thresholding moved to earlier in L0toL1 processing #214