delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.2k stars 395 forks source link

dbfs paths not supported #1376

Closed MrPowers closed 1 month ago

MrPowers commented 1 year ago

Environment

Delta-rs version: 0.9.0

Binding: Python

Environment:


Bug

What happened: Tried to instantiate a DeltaTable from a DBFS path, like this: deltalake.DeltaTable("dbfs:/some-thing/some_dir")

What you expected to happen: I expected this to work. This works: spark.read.format("delta").load("dbfs:/some-thing/some_dir").show()

How to reproduce it: Create a Delta table in Databricks with a DBFS path and then try to instantiate a deltalake.DeltaTable. Should be relatively easy to reproduce.

More details: N/A

rtyler commented 1 year ago

@MrPowers to the best of my knowledge there is not a REST API for DBFS or any such open "file system provider" for what DBFS actually. Does Databricks make it possible for third party interoperability with DBFS

MrPowers commented 1 year ago

@rtyler - yea, I'm not sure. Perhaps I have to figure out another way to get the path to the data.

Lundez commented 1 year ago

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

ion-elgreco commented 11 months ago

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

I also ran into this issue with AML, writing to mounted storage is not supported.

The way I do it now is I don't mount but write to the adls2 container directly.

Lundez commented 11 months ago

I have the same issue when using a mounted ADSL2 in a Azure ML Studio job. I wish to write, and it fails on writing the log. The parquet-file is correctly written.

This is ADSL2 with Hierarchial Storage.

I also ran into this issue with AML, writing to mounted storage is not supported.

The way I do it now is I don't mount but write to the adls2 container directly.

I solved it the same way, but that means my jobs aren't as clear (output is not job output but a hidden API call) 😅

Thanks for responding!

ion-elgreco commented 1 month ago

Should work now for mounted storage with change by https://github.com/delta-io/delta-rs/pull/1868