delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.49k stars 1.68k forks source link

[BUG] fn_ReadDeltaTable skipping partitions with brackets #2280

Open scuffell opened 11 months ago

scuffell commented 11 months ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

If a delta table partitioned on a field that contains round brackets, the partitions are ignored/don't show in Power BI.

Steps to reproduce

Create dataframe with a field that has round brackets in the value, output dataframe as delta format, partitioned by that field.

from pyspark.sql.types import StructType, StructField, StringType, IntegerType

data2 = [
  ("product 1", 10),
  ("product 1 (beta)", 12),
  ("product 3", 14),
  ("product 4", 35),
  ("product 4 (alpha)", 4)
  ]

schema = StructType([ \
    StructField("product",StringType(),True), \
    StructField("users",StringType(),True)
  ])

df = spark.createDataFrame(data = data2, schema = schema)
df.display()

destination_path = f"wasbs://test@xxx.blob.core.windows.net/pbi_connector_test"
(
  df.write
    .format("delta")
    .partitionBy("product")
    .save(destination_path)
)

Output: image

Storage: image

Observed results

When using fn_ReadDeltaTable in Power BI, results omit partitioned field where value contains round brackets:

Query: image

Results: image

Expected results

Expected to see all values of "product" field. E.g.:

image

Further details

Seems to be related to the URL encoding/decoding of paths. Adding a couple of Text.Replace's in line 311 fixes the problem, but I wonder if there are other special characters affected.

#"Added Full_Path" = Table.AddColumn(#"Files with Stats", "Full_Path", each Text.Replace(DeltaTablePath & Text.Replace([file_name], "=", "%3D"), "/", Delimiter), Text.Type),

#"Added Full_Path" = Table.AddColumn(#"Files with Stats", "Full_Path", each Text.Replace(DeltaTablePath & Text.Replace(Text.Replace(Text.Replace([file_name], "=", "%3D"), "(", "%28"), ")", "%29"), "/", Delimiter), Text.Type),

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

allisonport-db commented 10 months ago

@gbrueckl Do you have any input on this?

gbrueckl commented 10 months ago

Thanks for reporting @scuffell I just created a PR that fixes this issue https://github.com/delta-io/delta/pull/2289 it handles all potential special characters that get en-/decoded