GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
12 stars 4 forks source link

extrapolate gps_alt from entire record if needed #141

Closed BaptisteVandecrux closed 1 year ago

BaptisteVandecrux commented 1 year ago

Fixing this issue

The problem:

When GetBUFR was being called, a linear fit based on the last three months of lat, lon and alt was calculated. For latitude and longitude, coordinates contained in the Iridium emails were used to gap-fill missing observation of latitude and longitude, meaning that there was always data to fit a linear function on. For altitude, however, a failing GPS over more than 3 months would lead to the absence of observed altitude to build a linear function. In those cases, altitude is left as NaN in the BUFR file sent to DMI/WMO and saved in AWS_station_locations.csv.

The solution: My suggestion is to make a first attempt to interpolate lat, lon and altitude from the last three months and if altitude could not be estimated from that first try, we extrapolated the altitude for all the timestamps where it's missing, based on the entire record of measured altitude (that could be from several years back).

The pros: This will always provide an updated, best estimation of the station altitude.

The cons: Some of the stations have a failing GPS for many years (e.g. SCO_U since 2012). We therefore estimate the position in 2023 based on 2008-2012 data. I believe it is acceptable because the stations' downward motion is rather steady.

patrickjwright commented 1 year ago

@BaptisteVandecrux recognizing that your original intention was only to modify AWS_station_locations.csv, I like the idea of extrapolating altitude (when needed) for writing to the BUFR, so that we can submit more data to DMI. In the case where we have all instantaneous vars, lat and lon, but simply missing altitude, I think extrapolation is warranted.

The most recent commit has the following changes:

The main problem right now is that I am reading from '../aws-l3/tx/*/*_hour.csv' to get the transmitted data (need instantaneous obs). These files only go back a limited amount of time. So, even though we are trying to extrapolate using "full history", we are still not getting altitude for ['SCO_U', 'TAS_A', 'SCO_L']! I am printing the min timestamp available now in the console output if we try full history extrapolation. For example:

####### Processing SCO_L #######
Generating SCO_L.bufr from ../aws-l3/tx/SCO_L/SCO_L_hour.csv
TIMESTAMP: 2023-05-10 06:00:00
----> Running in dev mode!
Time checks passed.
finding positions for SCO_L
last transmission: 2023-05-10 09:00:00
----> Insufficient gps_alt data for SCO_L!
----> Using full history for linear extrapolation: gps_alt
first transmission: 2020-07-01 00:00:00
----> Insufficient gps_alt data for SCO_L!
----> No data exists for gps_alt. Stubbing out with NaN.
writing positions for SCO_L
----> Failed min_data_check for position!

So, not sure what to do here.... I suppose we could read from a different source? We are running into an issue (which @PennyHow identified early on), that getBUFR is primarily meant for writing BUFR files for DMI.... but since I was dealing with positions, I decided to add functionality to write the positions csv file. It may have been better to put this as a standalone routine (like getPositions), but now we have everything fairly "baked in" here.....

Anyhow, it would be great to get a general review of this new logic @BaptisteVandecrux! I have tested it and it seems to be working.

You can test just this module as follows (locally or on glacio01):

patrickjwright commented 1 year ago

It also looks like we are getting no position data for QAS_Lv3. Does that make sense?