Closed jiaw314 closed 1 month ago
Delta-rs version: 0.17.4
Binding: Python 3.11.6
Environment: MacBook Pro M1
What happened: The load_cdf() method works for nearly all of our delta tables on AWS S3 but it seems to be running into an error on a few:
thread '' panicked at python/src/lib.rs:611:18: called Result::unwrap() on an Err value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None) stack backtrace: 0: 0x3028a50e4 - _BrotliDecoderVersion 1: 0x3028c8e50 - _BrotliDecoderVersion 2: 0x3028a1ee0 - _BrotliDecoderVersion 3: 0x3028a4f18 - _BrotliDecoderVersion 4: 0x3028a66bc - _BrotliDecoderVersion 5: 0x3028a6404 - _BrotliDecoderVersion 6: 0x3028a6af8 - _BrotliDecoderVersion 7: 0x3028a69ec - _BrotliDecoderVersion 8: 0x3028a5568 - _BrotliDecoderVersion 9: 0x3028a6774 - _BrotliDecoderVersion 10: 0x30299fb60 - _BrotliDecoderVersion 11: 0x30299ff14 - _BrotliDecoderVersion 12: 0x3001f9998 - _PyInitinternal 13: 0x30012bc1c - 14: 0x3001341f4 - 15: 0x300113ce0 - 16: 0x30012e7d4 - 17: 0x101237f1c - _method_vectorcall_VARARGS_KEYWORDS 18: 0x101303d5c - PyEval_EvalFrameDefault 19: 0x1012f9444 - _PyEval_EvalCode 20: 0x10134ea18 - _run_eval_code_obj 21: 0x10134e97c - _run_mod 22: 0x10134e7bc - _pyrun_file 23: 0x10134e20c - __PyRun_SimpleFileObject 24: 0x10134db9c - __PyRun_AnyFileObject 25: 0x101369f70 - _pymain_run_file_obj 26: 0x1013698b0 - _pymain_run_file 27: 0x101369190 - _Py_RunMain 28: 0x10136a2c8 - _Py_BytesMain Traceback (most recent call last): File "/Users/jiawang/Desktop/Environments/deltars_test/backfill&continuous_batch_pandas_catalog_v2.py", line 127, in dt.load_cdf(starting_version=delta_max_version).read_all() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/jiawang/Desktop/Environments/deltars_test/lib/python3.11/site-packages/deltalake/table.py", line 694, in load_cdf return self._table.load_cdf( ^^^^^^^^^^^^^^^^^^^^^ pyo3_runtime.PanicException: called Result::unwrap() on an Err value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None)
Result::unwrap()
Err
What you expected to happen: I expect to get the change data feed for the latest version of the delta table when I call load_cdf().
How to reproduce it: Call load_cdf() on a very large Delta table?
More details:
You can increase the timeout, https://github.com/delta-io/delta-rs/issues/2537#issuecomment-2129285237
Environment
Delta-rs version: 0.17.4
Binding: Python 3.11.6
Environment: MacBook Pro M1
Bug
What happened: The load_cdf() method works for nearly all of our delta tables on AWS S3 but it seems to be running into an error on a few:
thread '' panicked at python/src/lib.rs:611:18:
called
14: 0x3001341f4 -
15: 0x300113ce0 -
16: 0x30012e7d4 -
17: 0x101237f1c - _method_vectorcall_VARARGS_KEYWORDS
18: 0x101303d5c - PyEval_EvalFrameDefault
19: 0x1012f9444 - _PyEval_EvalCode
20: 0x10134ea18 - _run_eval_code_obj
21: 0x10134e97c - _run_mod
22: 0x10134e7bc - _pyrun_file
23: 0x10134e20c - __PyRun_SimpleFileObject
24: 0x10134db9c - __PyRun_AnyFileObject
25: 0x101369f70 - _pymain_run_file_obj
26: 0x1013698b0 - _pymain_run_file
27: 0x101369190 - _Py_RunMain
28: 0x10136a2c8 - _Py_BytesMain
Traceback (most recent call last):
File "/Users/jiawang/Desktop/Environments/deltars_test/backfill&continuous_batch_pandas_catalog_v2.py", line 127, in
dt.load_cdf(starting_version=delta_max_version).read_all()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jiawang/Desktop/Environments/deltars_test/lib/python3.11/site-packages/deltalake/table.py", line 694, in load_cdf
return self._table.load_cdf(
^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called
Result::unwrap()
on anErr
value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None) stack backtrace: 0: 0x3028a50e4 - _BrotliDecoderVersion 1: 0x3028c8e50 - _BrotliDecoderVersion 2: 0x3028a1ee0 - _BrotliDecoderVersion 3: 0x3028a4f18 - _BrotliDecoderVersion 4: 0x3028a66bc - _BrotliDecoderVersion 5: 0x3028a6404 - _BrotliDecoderVersion 6: 0x3028a6af8 - _BrotliDecoderVersion 7: 0x3028a69ec - _BrotliDecoderVersion 8: 0x3028a5568 - _BrotliDecoderVersion 9: 0x3028a6774 - _BrotliDecoderVersion 10: 0x30299fb60 - _BrotliDecoderVersion 11: 0x30299ff14 - _BrotliDecoderVersion 12: 0x3001f9998 - _PyInitinternal 13: 0x30012bc1c -Result::unwrap()
on anErr
value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None)What you expected to happen: I expect to get the change data feed for the latest version of the delta table when I call load_cdf().
How to reproduce it: Call load_cdf() on a very large Delta table?
More details: