delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[BUG] Power BI Connector cannot read int64 Timestamp Columns #3772

Open aersam opened 1 month ago

aersam commented 1 month ago

Bug

Describe the problem

Steps to reproduce

  1. Create a Test File (here using Databricks)
    
    spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS")
    spark.sql("create table test_data using delta location 'abfss://MYPATH/repo/test';")

from datetime import datetime, timezone spark.createDataFrame([{"date": datetime.now(tz=timezone.utc)}]).write.option("mergeSchema", "true").mode("append").save(MYPATH)


The resulting delta file: [repo.zip](https://github.com/user-attachments/files/17393476/repo.zip)

2. Read it using fn_ReadDeltaTable in Power BI:

```m
let
    Source = AzureStorage.DataLake("MYPATH", [HierarchicalNavigation = false]),
    DeltaTable = fn_ReadDeltaTable(Source)
in
    DeltaTable

Observed results

Column is empty: image

Expected results

Column should correctly display the data, as does databricks image

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

I have to admit I don't really understand the Power Query Code of the Connector :)

aersam commented 1 month ago

FYI: Reading the parquet directly in power bi does work