SeitaBV / timely-beliefs

Model data as beliefs (at a certain time) about events (at a certain time).
MIT License
34 stars 6 forks source link

Can resampling support pandas DateOffsets? #13

Open Flix6x opened 4 years ago

Flix6x commented 4 years ago

Resampling to a different event_resolution currently only supports datetime.timedelta resolutions. Resampling to days or months is ambiguous. Anything less than 24 hours should behave as expected.

Pandas' DateOffsets seem to be a viable alternative or extension to datetime timedeltas, as they are interpreted more precisely within pandas. See for example https://github.com/pandas-dev/pandas/issues/35248.

However, using DateOffsets to resample is also not entirely without issues. See https://github.com/pandas-dev/pandas/issues/35219 [RESOLVED].

I've investigated how to support DateOffsets, and 1 problem we currently face is a decision whether to upsample or downsample, based on a comparison between input_resolution and output_resolution, which isn't a valid operation for DateOffsets. One idea to resolve this is to find the greatest common denominator, upsample to that resolution and then downsample to the new resolution. Another idea is to write our own comparison method for common DateOffsets.

Note that in pandas, the decision whether to upsample or downsample seems to be taken by the user by piping either a method that expands (pad, ffill) or reduces dimensions (mean, sum).