GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
MIT License
706 stars 40 forks source link

add support for Fabric OneLake storage #1809

Open djouallah opened 1 year ago

djouallah commented 1 year ago

trying this code

import glaredb
import pandas as pd
df = pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
    }
)

con = glaredb.connect("/lakehouse/default/Files")
con.sql(f'''CREATE or replace table  xxx  AS SELECT * FROM df ''')
con.close()

I think you need the latest version of arrow-rs to make it works https://github.com/apache/arrow-rs/pull/4573

scsmithr commented 1 year ago

Did some digging on this, it's likely we'll support abfs://... paths before the lakehouse file api (/lakehouse/...). There's some challenges around some unimplemented file system operations with blobfuse.


Notes for impl:

jordandakota commented 1 year ago

As a vote or confidence, a onelake destination in glaredb would make me choose this over Fabric any day. Power BI is great, the concept of onelake to empower power BI is great. Fabric not so much.

djouallah commented 1 year ago

That's fine, you don't need to like other Fabric Engines, OneLake is neutral and works with any Engine as long as it understand Delta table.

jordandakota commented 1 year ago

Exactly. Am currently working with databricks and having unity catalog in OneLake. Only remaining issue is how Unity writes a table name vs how OneLake prefers to see it.

djouallah commented 1 year ago

any update on this, I presume it should be easy now as it is supported by delta_rs

scsmithr commented 1 year ago

any update on this, I presume it should be easy now as it is supported by delta_rs

We've made some changes to how we plumb stuff through to delta-rs, but I have not tested if this all works yet with Fabric (either via abfs://... or through the filesystem api). We'll be checking on this over the next couple of days, and I'll follow up with an update.

jordandakota commented 1 year ago

Sounds great. Looking forward to it.

djouallah commented 9 months ago

any update, I see that you are using now the latest version of Arrow rs, basically we need something like this

write_deltalake("abfss://Delta_Table@onelake.dfs.fabric.microsoft.com/Delta_Table.Lakehouse/Tables/fruit",
df,storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})