delta-io / delta-sharing

An open protocol for secure data sharing
https://delta.io/sharing
Apache License 2.0
774 stars 173 forks source link

Load profile.json exception #455

Open cic1988 opened 9 months ago

cic1988 commented 9 months ago

Hello experts,

I followed the protocol example to build the reference server. The server generated the presigned URL when table/query endpoint is called.

Assumed that my table_url is profile.json#share.schema.table.

By using df = delta_sharing.load_as_pandas(table_url, limit=3) it loads the data well. But it has failed if I use load_as_spark.

Following code:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Delta Share Demo") \
    .config('spark.jars', 'packages/haddop-azure-3.3.6.jar,packages/delta-sharing-spark_2.12-0.6.4.jar') \
    .getOrCreate()

...

import delta_sharing
df = delta_sharing.load_as_spark(table_url)
df.limit(2).select("path").show()

In the error, it shows:

java.lang.RuntimeException: delta-sharing:/profile.json%23share.schema.table/123/25169076 is not a Parquet file. Expected magic number at tail, but found [0, 20, 14, 55]

Have you seen the error before?

linzhou-db commented 8 months ago

@cic1988 sorry haven't seen it before. Is this still happening? Do you have a full stack trace?