GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
12 stars 4 forks source link

BUFR files and position update stopping at LYN_T #164

Closed BaptisteVandecrux closed 10 months ago

BaptisteVandecrux commented 10 months ago

> getBUFR --positions --positions-filepath ../aws-l3/AWS_latest_locations.csv

....
####### Processing LYN_T #######
Generating LYN_T.bufr from ../aws-l3/tx/LYN_T/LYN_T_hour.csv
TIMESTAMP: 2023-05-26 21:00:00
----> Time checks failed for LYN_T
      current: 2023-05-26 21:00:00
       latest: 2023-05-26 21:00:00
finding positions for LYN_T
last transmission: 2023-08-09 11:00:00
Traceback (most recent call last):
  File "pandas/_libs/index.pyx", line 548, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 2263, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 2273, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1685134800000000000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 516, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/index.pyx", line 550, in pandas._libs.index.DatetimeEngine.get_loc
KeyError: Timestamp('2023-05-26 21:00:00')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 736, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    raise KeyError(key) from err
KeyError: Timestamp('2023-05-26 21:00:00')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aws/miniconda3/envs/py38/bin/getBUFR", line 246, in <module>
    df1_limited, positions = find_positions(df1, stid, args.time_limit, current_timestamp, positions)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pypromice/postprocess/csv2bufr.py", line 452, in find_positions
    s = df_limited.loc[current_timestamp]
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexing.py", line 1073, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs
    loc = index.get_loc(key)
  File "/home/aws/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 738, in get_loc
    raise KeyError(orig_key) from err
KeyError: Timestamp('2023-05-26 21:00:00')

At that site t_i, rh_i, p_i are not available since '2023-05-26 21:00:00' while gps_lon, gps_lat, gps_alt are still available image

The problem might come from the "current_timestamp" being passed to the find_position function: https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/3424f2c6f81619bc04446a565fdc1666eaa16bc1/bin/getBUFR#L238-L246

We can see that the "Time checks failed", meaning that the code correctly identify the last instantaneous values as too old. But still it is running find_position on "current_timestamp", which is several months old. I suggest setting current_timestamp = None in that situation.

BaptisteVandecrux commented 10 months ago

Another necessary update was to make extrapolation of GPS coordinates default.

https://github.com/GEUS-Glaciology-and-Climate/pypromice/commit/17eaa0740c916b59c8e250ec8a7aca749dfd8203

This was necessary because the instantaneous values are taken at the end of the hourly time step and are therefore always one hour ahead of the gps coordinates (which are the average over that same hour and are given the beginning of the hour as timestamp).