Closed ValueRaider closed 2 years ago
Related in terms of unnecessarily-strict API:
Instead of this code throwing an exception, I think it should return None or an empty DataFrame. In dynamic code it's entirely possible and acceptable for a date range to not cover active trading days e.g. a weekend. Exception should be reserved for genuine errors.
xcal.get_calendar("XLON",start="2022-06-02",end="2022-06-03")
exchange_calendars.errors.NoSessionsError: The requested ExchangeCalendar, XLON, cannot be created as there would be no sessions between the requested
start
('2022-06-02 00:00:00') andend
('2022-06-03 00:00:00') dates.
Hi @ValueRaider.
There are two different concepts here which I think you might be conflating?
get_calendar
returns an instance of a calendar class (in your example an instance of XLONExchangeCalendar). A calendar class has a host of properties and methods that provide for interrogating the calendar's session and minutes. The calendar_properties tutorial covers how to create calendars and demonstrates the properties available, one of which is .schedule
.
.schedule
propertyThe .schedule
property returns a pd.DataFrame
. It covers the full calendar. Accordingly a subset can be easily queried using the .schedule[start:end]
logic.
Worth noting that the above implementation differs from pandas_market_calendars
. pmc does not provide for defining the calendar bounds and instead creates schedules on-request via the calendar's schedule
method.
Instead of this code throwing an exception, I think it should return None or an empty DataFrame. In dynamic code it's entirely possible and acceptable for a date range to not cover active trading days e.g. a weekend. Exception should be reserved for genuine errors.
xcal.get_calendar("XLON",start="2022-06-02",end="2022-06-03")
get_calendar
is simply a convenience method to return an instance of a calendar class (not a schedule). Accordingly it either returns a class instance or throws an error. This is not the way to query if there are sessions over a date range. Rather start
and end
here are defining the bounds of any query that you might subsequently make of the calendar (be that via .session
or any other calendar property / method). By default calendars start 20 years prior to the current date and end 1 year forwards.
A calendar must contain at least one session, hence the above throws an error.
Suppose I want to get schedule for a specific day. Specifying date range in
schedule()
fails:dt = "2022-05-09"
xcal.get_calendar("XLON",start=dt,end=dt).schedule
This call doesn't reach the request for the schedule, again it's failing on trying to create the calendar instance. It's true that not allowing start and end to be the same date is rather arbitrary (so long as it represents a session), although I don't think it's unreasonable - this is not the way to query sessions and without having created a calendar it would not be possible to know if you could create a 'one-day' calendar for any specific day (with the exception of the 24/7 calendar).
But this code works fine, except slower than specifying a 2-day range in
get_calendar()
:xcal.get_calendar("XLON").schedule.loc[dt:dt]
This code works because it creates a default calendar (from 20 years ago to 1 year on) and then gets the subset of the schedule covering just the day. It is slower as it's creating the calendar covering 21 years, although once the calendar's created you can query the instance any number of times, with each query being as quick as getting a subset of a pd.DataFrame
.
import exchange_calendars as xcals
xlon = xcals.get_calendar("XLON")
dt, dt2, dt3 = "2022-05-09", "2022-07-09", "2022-09-09"
xlon.schedule.loc[dt:dt]
xlon.schedule.loc[dt2:dt2]
xlon.schedule.loc[dt3:dt3]
Narrow the default calendar bounds for quicker calendar creation, for example if you know all queries will be within the current year.
xlon = xcals.get_calendar("XLON", "2022", "2022-12-31")
xlon.schedule.loc[dt:dt]
xlon.schedule.loc[dt2:dt2]
...and if you query a range which has no sessions then you get your empty DataFrame:
>>> xlon.schedule.loc["2022-06-25":"2022-06-26"]
Empty DataFrame
Columns: [open, break_start, break_end, close]
Index: []
Thanks for your response, very detailed. My responses below are purely pedantic now, as with your response I can now solve my problem (caching full calendar). So you are welcome to ignore entirely, but it might be useful to ponder ...
I suppose this part best captures my confusion with API:
get_calendar()
... This is not the way to query if there are sessions over a date range
Why provide start
and end
date arguments at all? Their purpose isn't clearly specified anywhere so I was operating on assumption that they act like an optimising query. No problem if I have prior knowledge of my data dates as then can use e.g. start="2021, end="2022"
. But in my case case I have no prior knowledge, just a sequence of start->end date ranges.
Maybe it would be best to restrict start
and end
to hints e.g. years, enforcing that they're not a query? Or just remove entirely, as doesn't really provide any performance boost over simply fetching full calendar and having module cache it.
without having created a calendar it would not be possible to know if you could create a 'one-day' calendar for any specific day
This restriction doesn't apply to 2-3 day ranges which equally could be entirely missing sessions (weekend+public holiday).
Why provide start and end date arguments at all?
Optimization. The longer the period the calendar covers the longer the calendar takes to create and the larger the object. Without these options it would be necessary to force the same bounds on everybody.
Provide start
and end
arguments and the user has the flexibility to create the calendar according to their needs.
in my case case I have no prior knowledge, just a sequence of start->end date ranges
Set start
to the min start and end
to the max end?
Thanks for raising the issue. It's always useful to get users' perspectives. Being familiar with the interface can lead us to make assumptions when writing the docs. I'm going to take the liberty to rename this issue with the intention of doing the following:
ExchangeCalendar
and get_calendar
doc to clarify start
and end
arguments represent the calendar bounds.calendar_properties
tutorial to the same effect and note the trade off between the 'longer the period the calendar covers, the slower calendar creation and the larger the calendar object'.ExchangeCalendar
class methods bound_start
and bound_end
as bound_min
and bound_max
.start
and end
arguments from README Quick Start example (added in edit)Changes made in #211.
Renaming to min/max is a good idea.
Removing it from Quick Start entirely would help too. Setting bounds seems like a niche edge case (where simply caching the entire calendar doesn't work).
Suppose I want to get schedule for a specific day. Specifying date range in
schedule()
fails:dt = "2022-05-09"
xcal.get_calendar("XLON",start=dt,end=dt).schedule
But this code works fine, except slower than specifying a 2-day range in
get_calendar()
:xcal.get_calendar("XLON").schedule.loc[dt:dt]
Seems an arbitrary restriction.