DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.73k stars 111 forks source link

Hamilton tracker gives an error with datetime columns in polars dataframes #1127

Open elyase opened 1 week ago

elyase commented 1 week ago

Reproduction

from datetime import datetime

import polars as pl
from hamilton import driver
from hamilton_sdk import adapters

import __main__ as dag

def df() -> pl.Series:
    return pl.Series(
        "timestamp",
        [
            datetime(2021, 1, 1),
            datetime(2021, 1, 2),
            datetime(2021, 1, 3),
        ],
    )

tracker = adapters.HamiltonTracker(
    project_id=1,
    username="elyase",
    dag_name="polars",
)

dr = driver.Builder().with_modules(dag).with_adapters(tracker).build()
result = dr.execute(["df"])

Stack Traces

  File "/Users/yaser/Documents/shitcoins/.venv/lib/python3.12/site-packages/hamilton/node.py", line 249, in __call__
    return self.callable(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/elyase/test/.venv/lib/python3.12/site-packages/hamilton_sdk/tracking/polars_col_stats.py", line 52, in std
    return col.std()
           ^^^^^^^^^
  File "/Users/elyase/test/.venv/lib/python3.12/site-packages/polars/series/series.py", line 2049, in std
    return self._s.std(ddof)
           ^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: `std` operation not supported for dtype `datetime[μs]`
skrawcz commented 1 week ago

Thanks @elyase ! We might not have kept up with all the polars changes.

skrawcz commented 1 week ago

Do you have an example dataframe this breaks on?

elyase commented 6 days ago

hi @skrawcz, thanks for your reply, I edited the ticket with a reproduction