AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Intervals stored in Calendar object, user workflow possibly unclear #41

Closed BSchilperoort closed 2 years ago

BSchilperoort commented 2 years ago

With PR #23 the intervals are stored in the AdventCalendar object, and the map_to_years function returns self (i.e., the calendar) instead of just intervals.


Previous functionality

Previously data resampling data was performed in the following way:

calendar = s2spy.time.AdventCalendar()
resampled_data = calendar.resample(data) # intervals retrieved using map_to_data internally

And the map_years method allowed users to preview what the intervals would look like for the given years.

Current functionality

With the new changes, the calendar including the generated intervals are returned when calling map_years:

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_years(2020, 2022)
resampled_data = calendar.resample(data)

However, when the user then resamples data, map_to_data is called internally, and the intervals the user generated are never used. The above code block functions the exact same when the second line is removed, as in the resample method map_to_data will call map_years, ignoring any previously set intervals.


Expected functionality (from my perspective)

It could be more clear to actually use the intervals the user generates for resampling data. This makes sure that intervals are only, and explicitly, generated in a single place.

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_years(2020, 2022)
resampled_data = calendar.resample(data)
data

>>> '*data resampled to the intervals in anchor years 2020, 2021, 2022*'

If the user did not generate any intervals:

calendar = s2spy.time.AdventCalendar()
resampled_data = calendar.resample(data)

>>> ValueError: 'No intervals to resample data with. Please first generate intervals'
                'with `map_years` or `map_to_data` before calling resample'

If the data does not fully cover the generated intervals:

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_years(2020, 2022)
resampled_data = calendar.resample(data)

>>> Warning: 'Insufficient data to resample in 10 intervals, these will contain NaN values.'

map_to_data can still be used, but then as an alternative to map_years:

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_to_data(data1)
resampled_data1 = calendar.resample(data1)
resampled_data2 = calendar.resample(data2)
Peter9192 commented 2 years ago

I think #50 would solve this issue, right? The way I see it, the calendar is just a means to easily generate a calendar, and it might be best if it never stores anything, but rather poops out calendars in the form of pandas dataframes that are passed on to other functions. With #50 and #53, we're already stripping functionality of the calendar which enables such a simplification of the calendar's purpose.