AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Add more flexibility to BaseCalendar by enhancing the basic building block #100

Closed geek-yang closed 1 year ago

geek-yang commented 2 years ago

All calendars are generated based on the BaseCalendar. Currently we use pd.interval_range function to initialize the BaseCalendar. However, this makes it more difficult to work with gaps and limits its flexibility to support/construct more calendar types.

We can think about a basic building block which allows the developer (not visible to the user) to define gaps, targets and precursors easily, for instance like what is shown in the figure below:

220914_explain_ai4s2s_time

To define this base calendar, we can add more variables to provide more flexibility (e.g. target freq, number of targets, gap time, gap freq, precursor freq, number of precursors).

This can help with any type of calendar we would like to build, for instance, an AdventCalendar with gap (and allows target and precursor to have different frequency):

image

Or a IntervalCalendar with multiple gaps:

image

We only show the user high level API of calendars (e.g. the API for AdventCalendar will remain the same). We can also provide a "guru" version of calendar which gives user full space to define super complex calendar if they want.

But technically, we cannot rely on pd.interval_range. But we can still use the datetime module of pandas, to generate intervals and manage time, which can be further discussed.

Peter9192 commented 2 years ago

Thanks for elaborating on this. Perhaps for constructing multiple calendars out of small building blocks, it might also help to define the target and precursor periods as unique blocks:

precursor = s2spy.PrecursorPeriod(frequency=..., start=...)  # or frequency and end, or start and end
target = s2spy.TargetPeriod(frequency=..., start=....)

ltcalendar = s2spy.Calendar(target, precursor, lead_time='14d')  # or keep using lag instead of lead_time

# Create a compound calendar for searching through a range of lags/lead_times
calendars = []
for lead_time in range(14, 64, 7):
     calendars.append(s2spy.Calendar(target, precursor, lead_time=lead_time))
lagsearch_calendar = s2spy.concat(calendars)  # modelled after pandas/xarray

# Or, alternatively, create a high-level constructor for a compound calendar
lagsearch_calendar = s2spy.CompoundCalendar(target, precursor, lead_time=range(14, 64, 7))

# And in a similar fashion you could construct the adventcalendar.
BSchilperoort commented 2 years ago

Thanks for elaborating on this. Perhaps for constructing multiple calendars out of small building blocks, it might also help to define the target and precursor periods as unique blocks:

Great idea Peter, I think this would be the right way forward. This would give users the freedom to generate any calendar they would like, without needing a specific implementation by us.

geek-yang commented 2 years ago

Thanks for elaborating on this. Perhaps for constructing multiple calendars out of small building blocks, it might also help to define the target and precursor periods as unique blocks:


precursor = s2spy.PrecursorPeriod(frequency=..., start=...)  # or frequency and end, or start and end
target = s2spy.TargetPeriod(frequency=..., start=....)

Thanks for the inspiring thought, @Peter9192. I think this would work nicely. Given that the whole process is based on the anchor year, which corresponds to the way that people in the s2s domain define their questions, we can further come up with a workflow like:

# define target period using anchor date
target = s2spy.time.target_period(anchor_date, freq, n_targets=1)
precursor = s2spy.time.precursor_period(target[0].left, freq, n_precursors, gap=0) # target is needed to calculate timestamp for precursor
# only needed to build heterogeneous_calendar
# precursor_2 = s2spy.time.precursor_period(target[0].left - pd.Timedelta('60d'), freq, n_precursors, gap=0)
...
calendar = s2spy.time.concatenate(target, precursor#,precursor_2)
# or
calendar = target.append([precursor#, precursor_2])

This way, we are appending precursors to the target, which makes it easier to identify lead/lag relation between them.

BSchilperoort commented 2 years ago

Hi Yang, if we're building a calendar by appending, the following syntax seems the most intuitive to me:

cal = s2spy.time.CustomCalendar(anchor_date)

target = s2spy.time.TargetPeriod(length='30d')
cal.append(target)

# Add another target with a gap between the two targets;
cal.append(s2spy.time.TargetPeriod(length='30d', gap='10d'))

# Now we can add some precursor periods
precursor = s2spy.time.PrecursorPeriod(length='15d', gap='15d')
for i in range(10):
    cal.append(precursor)

cal.map_years(2020, 2022)
cal.show() # Display the calendar we built.
geek-yang commented 2 years ago

Hi Yang, if we're building a calendar by appending, the following syntax seems the most intuitive to me:

This looks very elegant! It can be the actual API we would like to show to the advanced user. Let's put it this way 😸.

Peter9192 commented 2 years ago

Hi Yang, if we're building a calendar by appending, the following syntax seems the most intuitive to me:

cal = s2spy.time.CustomCalendar(anchor_date)

target = s2spy.time.TargetPeriod(length='30d')
cal.append(target)

# Add another target with a gap between the two targets;
cal.append(s2spy.time.TargetPeriod(length='30d', gap='10d'))

# Now we can add some precursor periods
precursor = s2spy.time.PrecursorPeriod(length='15d', gap='15d')
for i in range(10):
    cal.append(precursor)

cal.map_years(2020, 2022)
cal.show() # Display the calendar we built.

I agree this looks good, but a few points:

precursor = s2spy.time.PrecursorPeriod(length='15d')
for i in range(10):
    cal.append(precursor, lead_time = i * 15)