NREL / nsrdb

NSRDB data processing pipeline. Includes satellite data assimilation, cloud property prediction and gap-filling, radiative transport modeling, and data collection.
https://nrel.github.io/nsrdb/
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

MERRA Data Interpolation Bug #51

Closed grantbuster closed 1 year ago

grantbuster commented 2 years ago

Bug Description Several ancillary variables are retrieved and interpolated from MERRA on a daily basis. Data at the end of the day is forward-filled resulting in some timesteps with constant values. Correct behavior would be to either linearly extrapolate or (ideally) retrieve the MERRA timestep around the current data to interpolate to.

Screenshots MicrosoftTeams-image

Charge code SETP 10304 71.01.01

grantbuster commented 1 year ago

Need to read in the daily MERRA file plus the next day's file here: https://github.com/NREL/nsrdb/blob/745d1c1e738f9f6ba442168fdede15c101084e99/nsrdb/data_model/merra.py#L136

Should be able to test this here with one more MERRA source file: https://github.com/NREL/nsrdb/blob/main/tests/test_data_model.py

bnb32 commented 1 year ago

@rolson2 It looks like that source data accessed here - https://github.com/NREL/nsrdb/blob/52bae183ebe3c2749990560b7efad6d63d720428/nsrdb/data_model/data_model.py#L1251

and then temporally interpolated here - https://github.com/NREL/nsrdb/blob/52bae183ebe3c2749990560b7efad6d63d720428/nsrdb/data_model/data_model.py#L1275:#L1285.

So if the source_data property can be modified to return data which includes enough timesteps then the interpolation problem will be fixed.

There are definitely some nuances here but I think this is the high-level idea.

grantbuster commented 1 year ago

@rolson2 so yeah you're going to have to make the MERRA data handler class pull the current day AND the next day (if available).

The current temporal lin class won't work with this as designed right now. The reindex() method is pretty aggressive and will drop all data from the second day. Here's an example of how to fix that:


  import pandas as pd
  import numpy as np

  ti_native = pd.date_range('20200101', '20200103', freq='1h', closed='left')
  ti_new = pd.date_range('20200101', '20200102', freq='15min', closed='left')

  data_native = np.arange(len(ti_native))

  # last timestep is 2020-01-01 23:45:00
  df = pd.DataFrame(data_native, index=ti_native).reindex(ti_new)
  print(df)

  df = pd.DataFrame(index=ti_new).merge(pd.DataFrame(data_native, index=ti_native), left_index=True, right_index=True, how='outer')
  print(df)
  print(df.iloc[90:100])
  df = df.interpolate('time').ffill().bfill().reindex(ti_new)
  arr = df.interpolate('time').ffill().bfill().reindex(ti_new).values
  print(df.iloc[90:100])
  print(df)