This PR fixes a performance bug that led downloaded files (e.g. with databricks fs cp dbfs:/Volumes/.../somefile .) to be buffered in memory before being written.
Results from profiling the download of a ~100MB file:
Before:
Type: alloc_space
Showing nodes accounting for 374.02MB, 98.50% of 379.74MB total
After:
Type: alloc_space
Showing nodes accounting for 3748.67kB, 100% of 3748.67kB total
Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI.
fix #1575
Tests
Verified that the CLI properly downloads the file when doing the profiling.
Changes
This PR fixes a performance bug that led downloaded files (e.g. with
databricks fs cp dbfs:/Volumes/.../somefile .
) to be buffered in memory before being written.Results from profiling the download of a ~100MB file:
Before:
After:
Note that this fix is temporary. A longer term solution should be to use the API provided by the Go SDK rather than making an HTTP request directly from the CLI.
fix #1575
Tests
Verified that the CLI properly downloads the file when doing the profiling.