AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Resample as a method vs. as a function. #50

Closed BSchilperoort closed 2 years ago

BSchilperoort commented 2 years ago

During today's demo session @sverhoeven asked why resample is a calendar method, versus just a function that takes a calendar and data as input.

After some consideration I do agree that this could be a more clear way to do it, and it would demarcate the functionalities of the calendar a bit better (instead of putting everything in it as a method).

This issue is related to #41.

Current implementation:

import s2spy.time

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_years(2010, 2020) # does nothing (unused intervals)
data_resampled = calendar.resample(data)

Alternative implementation:

import s2spy.time

calendar = s2spy.time.AdventCalendar()
calendar = calendar.map_years(2010, 2020)
data_resampled = s2spy.time.resample(calendar, data)
sverhoeven commented 2 years ago

Yep, it would focus the AdventCalendar to be responsible for just making intervals and not for resampling.

Setting the calendar variable again changes the type from s2spy.time.AdventCalendar to pandas.DataFrame. I dont like reusing variables like that. Can we instead use an extra variable?

import s2spy.time

calendar = s2spy.time.AdventCalendar()
intervals = calendar.map_years(2010, 2020)
data_resampled = s2spy.time.resample(intervals, data)
BSchilperoort commented 2 years ago

Setting the calendar variable again changes the type from s2spy.time.AdventCalendar to pandas.DataFrame

This is actually handled by the __repr__. When calling map_years the variable type does not change, but self._intervals is set in the calendar object.

The old functionality indeed used to be that map_years returned intervals. However, if we do not need to use any additional methods on the calendar, there is no clear reason to store them like this.

geek-yang commented 2 years ago

Thanks for the discussion. The calendar is meant for a feedback to the user. It is not used for calculation. So I think the current way that we save intervals to self._intervals and displays calendar when calling this object via __repr__ is fine.

But I agree that we just attach too many functionalities to the s2spy.time.AdventCalendar. It would be clear to make the resample function an extra utility included in s2spy.time (a static method?) and revoke it via s2spy.time.resample.

We can further discuss it when the team is back and re-organize the time module.

Peter9192 commented 2 years ago

I very much support the proposal of making resample a separate function. With respect to the reuse of variables, I agree with Stefan as well. I think people would typically do this in one go:

calendar = s2spy.time.AdventCalendar().map_years(2010, 2020)

and I guess you could argue that the AdventCalendar class should actually be an AdventCalendarBuilder class or something like that. But I'm not so well versed in design patterns, so I might be wrong here (see https://refactoring.guru/design-patterns/builder).