Cannot use dates before 1969-12-31 12:00:00 (i.e. negative epochs < -43200) in partitions on Windows #22311

Closed lsim-aegeri closed 2 days ago

lsim-aegeri commented 1 month ago

Dagster version


What's the issue?

Attempting to use a date before 1969-12-31 12:00:00 (i.e. negative epochs < -43200) results in the following error:

Operation name: PartitionHealthQuery

Message: [Errno 22] Invalid argument

Path: ["assetNodeOrError","assetPartitionStatuses"]

Locations: [{"line":11,"column":7}]

Stack Trace:
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\graphql\execution\execute.py", line 521, in execute_field
    result = resolve_fn(source, info, **args)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster_graphql\schema\asset_graph.py", line 1139, in resolve_assetPartitionStatuses
    ) = get_partition_subsets(
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster_graphql\implementation\fetch_assets.py", line 429, in get_partition_subsets
    updated_cache_value = get_and_update_asset_status_cache_value(
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\storage\partition_status_cache.py", line 422, in get_and_update_asset_status_cache_value
    updated_cache_value = _build_status_cache(
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\storage\partition_status_cache.py", line 290, in _build_status_cache
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\storage\partition_status_cache.py", line 218, in get_validated_partition_keys
    validated_partitions = {
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\storage\partition_status_cache.py", line 221, in <setcomp>
    if partitions_def.has_partition_key(pk, current_time=current_time)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\definitions\time_window_partitions.py", line 995, in has_partition_key
    partition_start_time = self.start_time_for_partition_key(partition_key)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\definitions\time_window_partitions.py", line 570, in start_time_for_partition_key
    return next(iter(self._iterate_time_windows(partition_key_dt))).start
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_core\definitions\time_window_partitions.py", line 923, in _iterate_time_windows
    prev_time = next(iterator)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_utils\schedules.py", line 821, in cron_string_iterator
    yield from _croniter_string_iterator(
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_utils\schedules.py", line 839, in _croniter_string_iterator
    next_date = next(reverse_cron)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\dagster\_utils\schedules.py", line 538, in _timezone_aware_cron_iter
    start_datetime = pendulum.from_timestamp(start_timestamp, tz=timezone_str)
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\pendulum\__init__.py", line 295, in from_timestamp
    dt = _datetime.datetime.utcfromtimestamp(timestamp)

I'm pretty sure this comes up becuase datetime.datetime.utcfromtimestamp() will not accept large negative epochs on Windows machines. For example:

import datetime


Traceback (most recent call last):
  File "C:\Users\lucien.simpfendoerfe\micromamba\envs\weather-dagster\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-31-e07f26fe4ab9>", line 1, in <module>
OSError: [Errno 22] Invalid argument

When I run an identical snippet on Linux, I get the expected result.

>>> import datetime
>>> datetime.datetime.utcfromtimestamp(-43250)
datetime.datetime(1969, 12, 31, 11, 59, 10)

What did you expect to happen?

I expect to be able to create partitions that start or end before 1969-12-31 12:00:00.

How to reproduce?

The particular partitions that are failing are:

monthly_partition = MonthlyPartitionsDefinition(
    start_date=datetime.datetime(1940, 1, 1),

yearly_partition = TimeWindowPartitionsDefinition(
    cron_schedule="0 0 1 1 *",
    start=datetime.datetime(1940, 1, 1),

jamiedemaria commented 3 weeks ago

cc @smackesey if you've run across this before or have any ideas? @gibsondan you've also been looking at pendulum stuff recently right?

From what I can find through searching around, pendulum claims to have fixed this in https://github.com/sdispater/pendulum/issues/53, but it was part of the 0.6.2 release in 2016, so you should definitely have that code.

Other libraries that ran into the same issue, seem to write custom code to special case negative time on windows:

This PR from py-arrow uses timedelta if the date is negative and the OS is windows

Another example from faker https://github.com/joke2k/faker/pull/1436

smackesey commented 3 weeks ago

IIUC @gibsondan has a PR that removes pendulum entirely so hopefully that will make this issue moot

gibsondan commented 3 weeks ago

We are going to remove pendulum, but we will still be using UNIX timestamps after that, so I'm not positive that will resolve this.

gibsondan commented 3 weeks ago

It's possible that the workaround suggested here could resolve this, but I have not tried it: https://stackoverflow.com/questions/37494983/python-fromtimestamp-oserror/41400321#41400321

jamiedemaria commented 3 weeks ago

yeah, after some internal discussion, this is something we aren't going to try to fix. We assume we can use unix timestamps all over our code base, so trying to work around this issue is infeasible at the time. Closing this issue