Open ion-elgreco opened 3 weeks ago
I think we would need a reproducer to action this, the linked issues aren't even clearly implicating object_store
Please also print the source of the error via Debug
print. Usually, it should be caused by connection reset
or similar network related errors.
@thomasfrederikhoeck @k-ye can you guys provide additional details please
@Xuanwo I would love to be of more help but I don't now how to do this in delta-rs (an in turn object_store). I didn't help setting the timeout to 300s.
@ion-elgreco Can you point me in the direction of how I can provide better logs?
@Xuanwo I would love to be of more help but I don't now how to do this in delta-rs (an in turn object_store). I didn't help setting the timeout to 300s.
Hi, if you can consistently reproduce this issue, please change the following places:
fn object_store_to_py(err: ObjectStoreError) -> PyErr {
match err {
ObjectStoreError::NotFound { .. } => PyFileNotFoundError::new_err(err.to_string()),
ObjectStoreError::Generic { source, .. }
if source.to_string().contains("AWS_S3_ALLOW_UNSAFE_RENAME") =>
{
DeltaProtocolError::new_err(source.to_string())
}
_ => PyIOError::new_err(err.to_string()),
}
}
Don't use err.to_string()
, print it's debug message instead.
@Xuanwo Ah thanks!! I get the following consistently :
Generic {
store: "MicrosoftAzure",
source: reqwest::Error {
kind: Decode,
source: reqwest::Error {
kind: Body,
source: TimedOut,
},
},
}
I also tried bumping the timeout to600s
. I still get _internal.DeltaError: Failed to parse parquet: Parquet error: Z-order failed while scanning data: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic MicrosoftAzure error: error decoding response body")), None)
but I never hit the debug print in this case. I am however seeing a lot of
[2024-06-24T21:09:33Z INFO object_store::client::retry] Encountered transport error backing off for 0.1 seconds, retry 1 of 10: error sending request for url (REDACTED)
[2024-06-24T21:13:00Z DEBUG hyper_util::client::legacy::client] client connection error: error shutting down connection
[2024-06-24T21:09:33Z INFO object_store::client::retry] Encountered transport error backing off for 0.1 seconds, retry 1 of 10: error sending request for url (REDACTED)
I suspect there's an issue with the network connection between your environment and Azure.
Could you provide more details about your setup?
azcopy
to read/write large file?@Xuanwo I might be network related but I have some feeling that is related to how object_store
or delta-rs
handles if there is a lower throughput than within a Azure data center (some connections going stale while waiting for somthing else).
azcopy bench "https://ACCOUNT.blob.core.windows.net/CONTAINER?SAS" --file-count 20 --size-per-file 10000M
. So 20 files of 10 Gb and here I get a throughput of 145 Mb/s. It runs through with no failures. The benchmark took 1+ hours with no failure while the delta-rs call fails within a few minutes.
Describe the bug We bumped the object store to 0.10 in delta-rs, and now we already seeing a couple reports on the following error
error decoding response body
. Happens on Azure and S3.See https://github.com/delta-io/delta-rs/issues/2595 and https://github.com/delta-io/delta-rs/issues/2592
To Reproduce Seems to occur when reading tables or doing operations on them.
Expected behavior Don't have an issue decoding the response body
Additional context
@thomasfrederikhoeck @k-ye