gerrymanoim / exchange_calendars

Calendars for various securities exchanges.
Apache License 2.0
443 stars 138 forks source link

Better clarify calendar `start` and `end` arguments as calendar bounds #210

Closed ValueRaider closed 2 years ago

ValueRaider commented 2 years ago

Suppose I want to get schedule for a specific day. Specifying date range in schedule() fails: dt = "2022-05-09" xcal.get_calendar("XLON",start=dt,end=dt).schedule

ValueError: start must be earlier than end although start parsed as '2022-05-09 00:00:00' and end as '2022-05-09 00:00:00'

But this code works fine, except slower than specifying a 2-day range in get_calendar(): xcal.get_calendar("XLON").schedule.loc[dt:dt] Seems an arbitrary restriction.

ValueRaider commented 2 years ago

Related in terms of unnecessarily-strict API:

Instead of this code throwing an exception, I think it should return None or an empty DataFrame. In dynamic code it's entirely possible and acceptable for a date range to not cover active trading days e.g. a weekend. Exception should be reserved for genuine errors. xcal.get_calendar("XLON",start="2022-06-02",end="2022-06-03")

exchange_calendars.errors.NoSessionsError: The requested ExchangeCalendar, XLON, cannot be created as there would be no sessions between the requested start ('2022-06-02 00:00:00') and end ('2022-06-03 00:00:00') dates.

maread99 commented 2 years ago

Hi @ValueRaider.

There are two different concepts here which I think you might be conflating?

Calendar class

get_calendar returns an instance of a calendar class (in your example an instance of XLONExchangeCalendar). A calendar class has a host of properties and methods that provide for interrogating the calendar's session and minutes. The calendar_properties tutorial covers how to create calendars and demonstrates the properties available, one of which is .schedule.

.schedule property

The .schedule property returns a pd.DataFrame. It covers the full calendar. Accordingly a subset can be easily queried using the .schedule[start:end] logic.

pandas_market_calendars

Worth noting that the above implementation differs from pandas_market_calendars. pmc does not provide for defining the calendar bounds and instead creates schedules on-request via the calendar's schedule method.

Examples

Instead of this code throwing an exception, I think it should return None or an empty DataFrame. In dynamic code it's entirely possible and acceptable for a date range to not cover active trading days e.g. a weekend. Exception should be reserved for genuine errors. xcal.get_calendar("XLON",start="2022-06-02",end="2022-06-03")

get_calendar is simply a convenience method to return an instance of a calendar class (not a schedule). Accordingly it either returns a class instance or throws an error. This is not the way to query if there are sessions over a date range. Rather start and end here are defining the bounds of any query that you might subsequently make of the calendar (be that via .session or any other calendar property / method). By default calendars start 20 years prior to the current date and end 1 year forwards.

A calendar must contain at least one session, hence the above throws an error.

Suppose I want to get schedule for a specific day. Specifying date range in schedule() fails: dt = "2022-05-09" xcal.get_calendar("XLON",start=dt,end=dt).schedule

This call doesn't reach the request for the schedule, again it's failing on trying to create the calendar instance. It's true that not allowing start and end to be the same date is rather arbitrary (so long as it represents a session), although I don't think it's unreasonable - this is not the way to query sessions and without having created a calendar it would not be possible to know if you could create a 'one-day' calendar for any specific day (with the exception of the 24/7 calendar).

But this code works fine, except slower than specifying a 2-day range in get_calendar(): xcal.get_calendar("XLON").schedule.loc[dt:dt]

This code works because it creates a default calendar (from 20 years ago to 1 year on) and then gets the subset of the schedule covering just the day. It is slower as it's creating the calendar covering 21 years, although once the calendar's created you can query the instance any number of times, with each query being as quick as getting a subset of a pd.DataFrame.

import exchange_calendars as xcals
xlon = xcals.get_calendar("XLON")
dt, dt2, dt3 = "2022-05-09", "2022-07-09", "2022-09-09"
xlon.schedule.loc[dt:dt]
xlon.schedule.loc[dt2:dt2]
xlon.schedule.loc[dt3:dt3]

Narrow the default calendar bounds for quicker calendar creation, for example if you know all queries will be within the current year.

xlon = xcals.get_calendar("XLON", "2022", "2022-12-31")
xlon.schedule.loc[dt:dt]
xlon.schedule.loc[dt2:dt2]

...and if you query a range which has no sessions then you get your empty DataFrame:

>>> xlon.schedule.loc["2022-06-25":"2022-06-26"] 
Empty DataFrame
Columns: [open, break_start, break_end, close]
Index: []
ValueRaider commented 2 years ago

Thanks for your response, very detailed. My responses below are purely pedantic now, as with your response I can now solve my problem (caching full calendar). So you are welcome to ignore entirely, but it might be useful to ponder ...


I suppose this part best captures my confusion with API:

get_calendar() ... This is not the way to query if there are sessions over a date range

Why provide start and end date arguments at all? Their purpose isn't clearly specified anywhere so I was operating on assumption that they act like an optimising query. No problem if I have prior knowledge of my data dates as then can use e.g. start="2021, end="2022". But in my case case I have no prior knowledge, just a sequence of start->end date ranges.

Maybe it would be best to restrict start and end to hints e.g. years, enforcing that they're not a query? Or just remove entirely, as doesn't really provide any performance boost over simply fetching full calendar and having module cache it.


without having created a calendar it would not be possible to know if you could create a 'one-day' calendar for any specific day

This restriction doesn't apply to 2-3 day ranges which equally could be entirely missing sessions (weekend+public holiday).

maread99 commented 2 years ago

Why provide start and end date arguments at all?

Optimization. The longer the period the calendar covers the longer the calendar takes to create and the larger the object. Without these options it would be necessary to force the same bounds on everybody.

Provide start and end arguments and the user has the flexibility to create the calendar according to their needs.

in my case case I have no prior knowledge, just a sequence of start->end date ranges

Set start to the min start and end to the max end?


Thanks for raising the issue. It's always useful to get users' perspectives. Being familiar with the interface can lead us to make assumptions when writing the docs. I'm going to take the liberty to rename this issue with the intention of doing the following:

Changes made in #211.

ValueRaider commented 2 years ago

Renaming to min/max is a good idea.

Removing it from Quick Start entirely would help too. Setting bounds seems like a niche edge case (where simply caching the entire calendar doesn't work).