Open djouallah opened 2 months ago
@auxten how do you get a schema when using this
df = sess.sql(sql,"ArrowStream")
write_deltalake(f"/lakehouse/default/Tables/T{total_files}/chdb",df, mode="append", partition_by=['year'], storage_options= storage_options)
I understand that what you’re trying to do is retrieve the output schema and then stream the data into Delta Lake.
I added chdb to my etl benchmarks, feel free to have a look, if i am doing something terribly wrong https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/ETL/Light_ETL_Python_Notebook.ipynb
first congratulation on the progress you made, chDB is substantially better than just 6 months ago, I am trying to read a folder of csv and export it to delta, current I am using df = sess.sql(sql,"ArrowTable") to transfer the data to deltalake Python, the problem is I am getting OOM errors, would be nice if you can add support for arrow recordbatch so the transfer is done in smaller batch
thanks