staging put: stream file uploads instead of loading all to memory

databricks / databricks-sql-go

Golang database/sql driver for Databricks SQL.

Apache License 2.0

37 stars 41 forks source link

staging put: stream file uploads instead of loading all to memory #197

Closed mdibaiee closed 7 months ago

mdibaiee commented 8 months ago

os.ReadFile reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, an os.File instance is itself a byte reader, and we can provide the file directly to http.NewRequest so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory.

mdibaiee commented 7 months ago

@yunbodeng-db @andrefurlan-db @rcypher-databricks @jadewang-db any chances of a review on this? This is currently blocking us from using Databricks efficiently

mdibaiee commented 7 months ago

@andrefurlan-db thanks! I can't merge the pull-request myself, and I think the lint and check jobs are failing for unrelated reasons (the lint errors are for other files and seem to complain because the imports are not recognised). can the pull-request be merged?

yunbodeng-db commented 7 months ago

@andrefurlan-db thanks! I can't merge the pull-request myself, and I think the lint and check jobs are failing for unrelated reasons (the lint errors are for other files and seem to complain because the imports are not recognised). can the pull-request be merged?

A team member will assist you shortly. Thanks for your patience.

candiduslynx commented 7 months ago

@kravets-levko @yunbodeng-db @mdibaiee I get 501 unimplemented response with this change, I really think this should be reverted & properly tested with the backend.

My assumption is that the file streaming is alright, but the backend doesn't actually allow data with unknown length, hence, this fails.