Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.18k stars 146 forks source link

Issue with write_deltalake #2444

Open ghost opened 3 months ago

ghost commented 3 months ago

Describe the bug I am trying to use the write_deltalake function with where clause for a Timestamp comparison.

To Reproduce Steps to reproduce the behavior:

df = (
    daft.read_deltalake("abfss://raw@xxxxx.dfs.core.windows.net/yy")
    .where(col("ReceivedDate") == datetime.date(2024, 4, 16))
    .where(
        col("Timestamp")
        > datetime.datetime(2024, 4, 16, 12, 1, 1, tzinfo=datetime.timezone.utc)
    )
    .where(
        col("Timestamp")
        < datetime.datetime(2024, 4, 16, 12, 31, 1, tzinfo=datetime.timezone.utc)
    )
)

# print(df.count(col("Hash")).collect())  # This one works 

df.write_deltalake("us-ddk", mode="overwrite") # This one throw a error 

Expected behavior No errors

Errors


daft.exceptions.DaftTypeError: Cannot perform comparison on types: Timestamp(Nanoseconds, None), Timestamp(Microseconds, Some("UTC"))
Details:
DaftError::TypeError could not determine supertype of Timestamp(Nanoseconds, None) and Timestamp(Microseconds, Some("UTC"))
samster25 commented 3 months ago

Hi @vibh3s, I think the issue here is that col("Timestamp") may not have a timezone associated with it and therefore we can not compare it to a timestamp with a timezone.

Can you try replacing the predicate with

    .where(
        col("Timestamp")
        > datetime.datetime(2024, 4, 16, 12, 1, 1)
    )
ghost commented 3 months ago

image

It fails only when writing to deltalake.

jaychia commented 1 week ago

@raunakab could you take a look and help us triage this issue?