ImperialCollegeLondon / pyrealm

Development of the pyrealm package, providing an integrated toolbox for modelling plant productivity, growth and demography using Python.
https://pyrealm.readthedocs.io/
MIT License
19 stars 8 forks source link

Handling of incomplete days in FastSlowScaler #169

Closed MarionBWeinzierl closed 5 months ago

MarionBWeinzierl commented 7 months ago

This PR fixes #72 .

It handles potential incomplete first and last days of measurements by padding, both for the datetime array and the value array.

Tasks for this:

davidorme commented 7 months ago

I'd been thinking that the _include structure handles the padding, but of course the _set_times method and other places wants self.datetimes to map onto the first axis. It does seem easier to pad the datetimes than to fix all the downstream special cases.

I've done something similar before using np.unique, which might be useful?


datetimes = np.arange(
    np.datetime64('2002-01-03 03:15'),
    np.datetime64('2002-01-06 09:45'), 
    np.timedelta64(30, 'm')
)

# Work out observations by date
dates = datetimes.astype("datetime64[D]")
unique_dates, date_counts = np.unique(dates, return_counts=True)
complete_count = date_counts.max()

padding = (complete_count - date_counts[0], complete_count - date_counts[-1])

if padding != (0,0):

    # Do we have at least _one_ complete day
    if len(unique_dates) < 3:
        ValueError("Not enough data")

    # Get a complete set of observation times from day 2 - we have guaranteed that the 
    # second day is complete.
    complete_times = datetimes[dates == unique_dates[1]]
    observation_timedeltas = complete_times - complete_times[0].astype("datetime64[D]")

    # Recreate the complete timeseries, using an offset to the last observations to avoid an 
    # off by one error from np.arange.
    datetimes = np.arange(
        unique_dates[0] + observation_timedeltas[0],
        unique_dates[-1] + (observation_timedeltas[-1] + np.timedelta64(1, 's')), 
        np.diff(observation_timedeltas)[0]
    )
MarionBWeinzierl commented 6 months ago

@davidorme , at the moment (apart from the issues marked with comments in the changes), there is quite a non-pretty issue in the implementation. I had first assumed that all padding could happen within the methods of fast_slow_scaler, using the padding as a private function. However, this seems not to be possible due to some of the stuff which is going on in subdaily. So, as it stands, I do some padding in get_window_values, but then a lot of stuff has to be padded in subdaily, too. It might be better to pad the env arrays, as discussed in #72 , but then again that would, without further check, lead to double-padding when the fs_scaler functions are called.

Do you have any preference how this should be handled? We could

davidorme commented 6 months ago

However, this seems not to be possible due to some of the stuff which is going on in subdaily.

I haven't quite got my head around what happens outside of the FastSlowScaler. So, with incomplete days:

I think I'm missing something!

MarionBWeinzierl commented 6 months ago

Just to show the difference I get in that test at the moment:

cmp_pyrealm_test

There seems to be a bit of a left shift, but also a difference in magnitude which is stronger towards the middle.

MarionBWeinzierl commented 5 months ago

OK, what would you prefer at this point, @davidorme ? Do you want to raise a new PR with your implementation, and we close this one, or does it make any sense for me to further work on this one?

codecov-commenter commented 5 months ago

Codecov Report

Attention: Patch coverage is 88.57143% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 94.72%. Comparing base (b41b416) to head (9efa1af).

Files Patch % Lines
pyrealm/pmodel/fast_slow_scaler.py 88.57% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #169 +/- ## =========================================== - Coverage 94.91% 94.72% -0.20% =========================================== Files 29 29 Lines 1535 1555 +20 =========================================== + Hits 1457 1473 +16 - Misses 78 82 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

MarionBWeinzierl commented 5 months ago

Codecov Report

Attention: Patch coverage is 88.57143% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 94.72%. Comparing base (b41b416) to head (9efa1af).

Files Patch % Lines pyrealm/pmodel/fast_slow_scaler.py 88.57% 4 Missing ⚠️ Additional details and impacted files

☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

This is not currently actually checking the padding. If we stick with this version of the padding implementation we could cherry-pick your be_vie_data_components_padded test from your version, @davidorme .

MarionBWeinzierl commented 5 months ago

Close, as replaced by #194