Open dhirschfeld opened 2 months ago
i.e. allow passing through fh
rather than creating it internally by opening a file from the filesystem:
https://github.com/databricks/databricks-sql-python/blob/d31063ca918167412153a368c13a99055bf89c02/src/databricks/sql/client.py#L656-L668
Hi @dhirschfeld! This indeed sounds like an intersting feature, thank you for sharing it! I have to talk with the rest of team first. Databricks SQL GET
and PUT
commands should have local file path specified, but I don't know if we ever considered using streams instead of real files. If we agree that there are no risks with this approach - we would have to implement it across all drivers eventually
Some added context, @dhirschfeld's idea is exactly how the e2e tests for this feature behave (since we ran them in github actions where we don't have a real file system to write to). Should be a straightforward modification.
Writing large amounts of data to disk, only for
databricks-sql-connector
to then read it back in from disk, is incredibly inefficient.It would be much more efficient to be able to provide a file-like object to use instead of a filepath. In that way a user could write the data to an in-memory
io.BytesIO
object instead of writing the data to disk.