I am trying to optimize a query and noticed that the HTTP stats in EXPLAIN ANALYZE statements seem to be off. I query one Parquet file with 10.79 GiB and the HTTP stats mention reading 32.7 GiB. I am wondering whether http_state_policy.cpp could be over-counting total_bytes_received, and in particular including values from the content-length HTTP header of HEAD requests.
SET azure_transport_option_type = curl;
SET azure_http_stats = True;
SET threads = 1;
SET azure_read_transfer_concurrency = 1;
SET azure_read_transfer_chunk_size = 1024 * 1024;
SET azure_read_buffer_size = 1024 * 1024;
EXPLAIN ANALYZE SELECT col1 FROM 'az://<snip>.blob.core.windows.net/<snip>.parquet' LIMIT 1
In the Azure SDK logs I see 3 HEAD requests with content-length : 11583653237 and 349 GET requests with content-length : 1048576. So the total input data should be around 0.34 GiB instead of 32.7 GiB.
If this analysis is correct, I can send a small PR to fix.
I am trying to optimize a query and noticed that the HTTP stats in
EXPLAIN ANALYZE
statements seem to be off. I query one Parquet file with 10.79 GiB and the HTTP stats mention reading 32.7 GiB. I am wondering whether http_state_policy.cpp could be over-countingtotal_bytes_received
, and in particular including values from thecontent-length
HTTP header of HEAD requests.In the Azure SDK logs I see 3 HEAD requests with
content-length : 11583653237
and 349 GET requests withcontent-length : 1048576
. So the total input data should be around 0.34 GiB instead of 32.7 GiB.If this analysis is correct, I can send a small PR to fix.