This pr adds a Timestamp class that wraps a UNIX timestamp with functionality for datetime parsing and formatting. Secondary changes add supporting library/toolkit functionality for processing datetimes. Notably, all the changes are in typed_python and Entrypointable.
Approach
The Timestamp class wraps a UNIX timestamp. This UNIX timestamp can be provided, parsed from a string representing a datetime, or constructed from a set of values representing a datetime.
For e.g, you can create a Timestamp from a unix timestamp with any of the following statements.
The module provides 3 ways to create timestamps from string representation of dates.
1: You can tell the parser the format of the provided datestring. This is the most efficient option. This is equivalent to datetime.strptime(). E.g.
2: If the string is any variant of an ISO 8601 formatted string, you can use the .parse_iso_str method. This method is slightly more permissive than the ISO 8601 standard in that it allows a space for the datetime separator (in addition to 'T') and allows timezone abbreviations E.g.
For convenience, there's a multi-use .parse() entry point. That will parse a datestring with a format if provided. If no format string is provided, .parsewill attempt to parse the date_str as an ISO 8601 string. Failing that, it attempts to parse using the supported non-iso formats.
The functionality for parsing datestrings is implemented in the reusable DateParser component. Specifically, the component exposes DateParser.parse which in turn proxies to DateParser.parse_iso_format and DateParser.parse_non_iso_format. These methods convert a string representation of a datetime to a UNIX timestamp. E.g.
time = DateParser.parse("2022-01-05T10:11:12+00:15")
time = DateParser.parse("2022-01-05T10:11:12NYC")
DateParser additionally depends on Timezone. Timestamps are pegged to UTC and do not store timezone information. The parser needs to adjust the timestamp by the appropriate offset from UTC. Timezone provides support for converting a timezone abbreviation to a utc_offset. Timezone offset supports relative zones - meaning if the offset is "ET (Eastern Time)" or "NYC" then it will return either the offset for EST (Eastern Standard Time) or EDT (Eastern Daylight time) as appropriate.
Note: the date parsing logic handles a useful range of non-iso date formats. For example, it will correctly parse dates like "Jan 2, 1997" or "Jan-1-1997" or "1-January-1997". However, parsing of ambiguous dates is NOT supported. For example, attempting to parse a date with a 2 digit year cause the parser to throw an error.
The supporting functionality for formatting Timestamps as strings is implemented in the reusable DateFormatter component. E.g
By default DateFormatter.format outputs an ISO 8601 formatted string (YYYY-MM-DDTHH:MM:SS). However, it also accepts a format string (E.g. "%Y-%m-%d") using standard python format directives.
By default DateFormatter.format returns a date string in UTC. However, it also accepts a utc_offset (in seconds) as input.
DateParser and DateFormatter both depend on some low level datetime processing/validation algorithms. For eg. these algorithms let you convert a timestamp to (day, month, year, day of week, weekday, hour, etc) values and vice-versa. These algorithms are implemented in the Chrono component.
How Has This Been Tested?
This PR adds tests for the individual components (DateParser, DateFormatter, Timezone, Chrono). Also adds extensive unit tests for main Timestamp component.
The unit tests compare against standard python objects/builtins where relevant. This means, for example, that the Timestamp.format functionality is checked for correctness against python's Datetime.strftimeand Timestamp.parse* methods are checked for correctness againstDatetime.strptime
For exhaustiveness (and possibly overkill) the tests run over long time ranges (e.g. 'all the days over a span of two years' or 'all the seconds over a span of 3 months)
Types of changes
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
Motivation and Context
This pr adds a Timestamp class that wraps a UNIX timestamp with functionality for datetime parsing and formatting. Secondary changes add supporting library/toolkit functionality for processing datetimes. Notably, all the changes are in typed_python and Entrypointable.
Approach
The Timestamp class wraps a UNIX timestamp. This UNIX timestamp can be provided, parsed from a string representing a datetime, or constructed from a set of values representing a datetime.
For e.g, you can create a Timestamp from a unix timestamp with any of the following statements.
The module provides 3 ways to create timestamps from string representation of dates. 1: You can tell the parser the format of the provided datestring. This is the most efficient option. This is equivalent to
datetime.strptime().
E.g.2: If the string is any variant of an ISO 8601 formatted string, you can use the .
parse_iso_str
method. This method is slightly more permissive than the ISO 8601 standard in that it allows a space for the datetime separator (in addition to 'T') and allows timezone abbreviations E.g.3: Can parse a range of non-iso date formats with
.parse_non_iso_str.
E.gFor convenience, there's a multi-use
.parse()
entry point. That will parse a datestring with a format if provided. If no format string is provided,.parse
will attempt to parse thedate_str
as an ISO 8601 string. Failing that, it attempts to parse using the supported non-iso formats.You can convert Timestamps to strings using standard python time format directives. E.g:
The functionality for parsing datestrings is implemented in the reusable DateParser component. Specifically, the component exposes DateParser.parse which in turn proxies to DateParser.parse_iso_format and DateParser.parse_non_iso_format. These methods convert a string representation of a datetime to a UNIX timestamp. E.g.
DateParser additionally depends on Timezone. Timestamps are pegged to UTC and do not store timezone information. The parser needs to adjust the timestamp by the appropriate offset from UTC. Timezone provides support for converting a timezone abbreviation to a utc_offset. Timezone offset supports relative zones - meaning if the offset is "ET (Eastern Time)" or "NYC" then it will return either the offset for EST (Eastern Standard Time) or EDT (Eastern Daylight time) as appropriate.
Note: the date parsing logic handles a useful range of non-iso date formats. For example, it will correctly parse dates like "Jan 2, 1997" or "Jan-1-1997" or "1-January-1997". However, parsing of ambiguous dates is NOT supported. For example, attempting to parse a date with a 2 digit year cause the parser to throw an error.
The supporting functionality for formatting Timestamps as strings is implemented in the reusable DateFormatter component. E.g
By default DateFormatter.format outputs an ISO 8601 formatted string (YYYY-MM-DDTHH:MM:SS). However, it also accepts a format string (E.g. "%Y-%m-%d") using standard python format directives.
By default DateFormatter.format returns a date string in UTC. However, it also accepts a utc_offset (in seconds) as input.
DateParser and DateFormatter both depend on some low level datetime processing/validation algorithms. For eg. these algorithms let you convert a timestamp to (day, month, year, day of week, weekday, hour, etc) values and vice-versa. These algorithms are implemented in the Chrono component.
How Has This Been Tested?
This PR adds tests for the individual components (DateParser, DateFormatter, Timezone, Chrono). Also adds extensive unit tests for main Timestamp component.
The unit tests compare against standard python objects/builtins where relevant. This means, for example, that the Timestamp.format functionality is checked for correctness against python's
Datetime.strftime
andTimestamp.parse*
methods are checked for correctness againstDatetime.strptime
For exhaustiveness (and possibly overkill) the tests run over long time ranges (e.g. 'all the days over a span of two years' or 'all the seconds over a span of 3 months)
Types of changes
Checklist: