databricks / cli

Databricks CLI
Other
132 stars 50 forks source link

[Fix] Do not buffer files in memory when downloading #1599

Closed renaudhartert-db closed 2 months ago

renaudhartert-db commented 2 months ago

Changes

This PR fixes a performance bug that led downloaded files (e.g. with databricks fs cp dbfs:/Volumes/.../somefile .) to be buffered in memory before being written.

Results from profiling the download of a ~100MB file:

Before:

Type: alloc_space
Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total

After:

Type: alloc_space
Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total

Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI.

fix #1575

Tests

Verified that the CLI properly downloads the file when doing the profiling.