delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.32k stars 410 forks source link

Write to Microsoft OneLake failed. #1764

Closed RobinLin666 closed 1 year ago

RobinLin666 commented 1 year ago

Environment

Delta-rs version: python-0.12.0

Binding: python-0.12.0

Environment:


Bug

What happened: this simple example does not work.

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("abfss://xxx@onelake.dfs.fabric.microsoft.com/test.Lakehouse/Tables/sample_table2", df,
 storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[48], line 2
      1 df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
----> 2 write_deltalake("abfss://xxx@onelake.dfs.fabric.microsoft.com/test.Lakehouse/Tables/sample_table2", df,
      3  storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:153, in write_deltalake(table_or_uri, data, schema, partition_by, filesystem, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, overwrite_schema, storage_options, partition_filters, large_dtypes)
    150     else:
    151         data, schema = delta_arrow_schema_from_pandas(data)
--> 153 table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
    155 # We need to write against the latest table version
    156 if table:

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:417, in try_get_table_and_table_uri(table_or_uri, storage_options)
    414     raise ValueError("table_or_uri must be a str, Path or DeltaTable")
    416 if isinstance(table_or_uri, (str, Path)):
--> 417     table = try_get_deltatable(table_or_uri, storage_options)
    418     table_uri = str(table_or_uri)
    419 else:

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:430, in try_get_deltatable(table_uri, storage_options)
    426 def try_get_deltatable(
    427     table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
    428 ) -> Optional[DeltaTable]:
    429     try:
--> 430         return DeltaTable(table_uri, storage_options=storage_options)
    431     except TableNotFoundError:
    432         return None

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/table.py:250, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
    231 """
    232 Create the Delta Table from a path with an optional version.
    233 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    247 
    248 """
    249 self._storage_options = storage_options
--> 250 self._table = RawDeltaTable(
    251     str(table_uri),
    252     version=version,
    253     storage_options=storage_options,
    254     without_files=without_files,
    255     log_buffer_size=log_buffer_size,
    256 )
    257 self._metadata = Metadata(self._table)

OSError: Encountered object with invalid path: Error parsing Path "test.Lakehouse/Tables/sample_table2/_delta_log/_commit_ed2503ff-f28f-40c2-9a41-5be43ede8930.json.tmp#1": Encountered illegal character sequence "#" whilst parsing path segment "_commit_ed2503ff-f28f-40c2-9a41-5be43ede8930.json.tmp#1"

What you expected to happen:

How to reproduce it:

More details: Link with https://github.com/delta-io/delta-rs/issues/1418#issuecomment-1769840660

djouallah commented 1 year ago

@RobinLin666 how did you get aadToken ?

RobinLin666 commented 1 year ago

@RobinLin666 how did you get aadToken ? I use TridentTokenLibrary

from trident_token_library_wrapper import PyTridentTokenLibrary
token = PyTridentTokenLibrary.get_access_token("storage")
djouallah commented 1 year ago

@RobinLin666 thank you so much it worked for me !!!!!

RobinLin666 commented 1 year ago

Thank you @djouallah , it works somehow!