dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.14k stars 1.4k forks source link

Unknown cause for ValueError: time data 'START_DATE' does not match format '%Y-%m-%d' when using environment variable #23355

Closed EvanZ closed 1 month ago

EvanZ commented 1 month ago

Dagster version

dagster, version 1.7.14

What's the issue?

I am working with partitions by date. In my partition file I have the following code:

from dagster import DailyPartitionsDefinition, EnvVar

daily_partition = DailyPartitionsDefinition(
    start_date=EnvVar('START_DATE'),
    end_date=EnvVar('END_DATE')
)

I am getting the environment variables from my .env:

.
.
.
START_DATE='2024-03-21'
END_DATE='2024-03-25'

It seems that when I run dagster dev the env variables are being loaded because I see the following being logged to stdout:

2024-07-31 15:17:06 -0700 - dagster - INFO - Loaded environment variables from .env file: DUCKDB_DATABASE,START_DATE,END_DATE

However, I get the following error traceback:

Stack Trace:
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 411, in __init__
    self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(
                                                              ^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 234, in __init__
    with user_code_error_boundary(
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/errors.py", line 297, in user_code_error_boundary
    raise new_error from e

The above exception was caused by the following exception:
ValueError: time data 'START_DATE' does not match format '%Y-%m-%d%z'

Stack Trace:
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/errors.py", line 287, in user_code_error_boundary
    yield
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_grpc/server.py", line 245, in __init__
    loadable_targets = get_loadable_targets(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets
    else loadable_targets_from_python_module(module_name, working_directory)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/workspace/autodiscovery.py", line 31, in loadable_targets_from_python_module
    module = load_python_module(
             ^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module
    return importlib.import_module(module_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/dagster_jobs/__init__.py", line 3, in <module>
    from .assets import espn
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/dagster_jobs/assets/espn.py", line 16, in <module>
    from ..partitions import daily_partition
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/dagster_jobs/partitions/__init__.py", line 3, in <module>
    daily_partition = DailyPartitionsDefinition(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/definitions/time_window_partitions.py", line 1102, in __new__
    return super(DailyPartitionsDefinition, cls).__new__(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/definitions/time_window_partitions.py", line 308, in __new__
    start_dt = dst_safe_strptime(start, timezone, fmt)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/definitions/time_window_partitions.py", line 126, in dst_safe_strptime
    dt = datetime.strptime(date_string, dst_safe_fmt(fmt))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/_strptime.py", line 554, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/_strptime.py", line 333, in _strptime
    raise ValueError("time data %r does not match format %r" %

The above exception occurred during handling of the following exception:
ValueError: time data 'START_DATE' does not match format '%Y-%m-%d'

Stack Trace:
  File "/Users/evanzamir/projects/ncaam-data-pipelines/dagster-jobs/.virtualenv/lib/python3.12/site-packages/dagster/_core/definitions/time_window_partitions.py", line 123, in dst_safe_strptime
    dt = datetime.strptime(date_string, fmt)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/_strptime.py", line 554, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/_strptime.py", line 333, in _strptime
    raise ValueError("time data %r does not match format %r" %

  warnings.warn(f"Error loading repository location {location_name}:{error.to_string()}")

I can't figure out what is causing this behavior.

What did you expect to happen?

In a previous version of my code I had defined the dates in a constants file, so essentially I was hardcoding them. I want to transition to using environment variables.

How to reproduce?

No response

Deployment type

Local

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

garethbrickman commented 1 month ago

Could you try using print() or context.log.info() to see and confirm the EnvVar value for start_date?

EvanZ commented 1 month ago

Hi @garethbrickman could you suggest where to log to? I added a logging command to the asset that is using the partition but I don't see any logs being created. I'm not even sure where to put the print statement either to be honest.

EvanZ commented 1 month ago

While I wait for reply, I should note that when I use os.environ instead of EnvVar to define the partition, it works without issue:

import os
from datetime import datetime

from dagster import DailyPartitionsDefinition, EnvVar

daily_partition = DailyPartitionsDefinition(
    start_date=os.environ['START_DATE'],
    end_date=os.environ['END_DATE']
)

So it must be something about the timing of loading .env that I am not understanding.

garethbrickman commented 1 month ago

You could create a test asset in your code just to print/log the values and materialize it. Here's a self-contained example:

from dagster import asset, OpExecutionContext, Definitions, EnvVar
import os

@asset
def test_asset(context: OpExecutionContext):
    context.log.info(f"Using EnvVar: {EnvVar('START_DATE').get_value()}")
    context.log.info(f"Using os.getenv: {os.getenv('START_DATE')}")

defs = Definitions(
    assets=[test_asset]
)

From the docs:

  • When os.getenv is used, the variable's value is retrieved when Dagster loads the code location and will be visible in the UI.
  • When EnvVar is used, the variable's value is retrieved at runtime and won't be visible in the UI.

EnvVar defers resolution of the environment variable value until run time, and should only be used as input to Dagster config or resources. To access the environment variable value, call get_value on the EnvVar, or use os.getenv directly.

EvanZ commented 1 month ago

Thanks @garethbrickman. I have to admit that part of the docs confused me a bit when I read it before. When I launch the Dagster UI via dagster dev, for example, is that "run time"? Or is "run time" when an asset is materialized? For example, if I run dagster dev materialize some assets, and then decide to change environment variables, is it the case that EnvVar will not see that change? And that if I want changes in env variables to be propagated to asset materializations, I should use os.getenv?

garethbrickman commented 1 month ago

I think "run time" is when an asset is materialized/job is run, it's referred to also as "launch time" as in launching a job with a configuration.

In terms of changing the contents of the .env file, any time the file is modified the workspace must be re-loaded to make the Dagster webserver/UI aware of the changes.