Open vdytyniak-exos opened 1 year ago
We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.
What version of pyarrow are you using? What OS?
We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.
What version of pyarrow are you using? What OS?
pyarrow=10.0.1 os: ubuntu:20.04
I did some basic research on the error and didn't find much. The only thing I could see that might cause this is if there is an incompatibility between the S3 SDK and the curl versions (e.g. if the S3 SDK was developed / compiled against one version and linked / run with another version).
How are you obtaining pyarrow? Is it from conda, pip, or a build from source? Can you use ldd to check which library versions it is linking against? For example, I use conda so I run this:
(arrow-release-11) pace@pace-desktop:~$ ldd ~/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/libarrow_python.so.1000.1.0
...
libcurl.so.4 => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../../././libcurl.so.4 (0x00007f3ffa8ac000)
...
libaws-c-s3.so.0unstable => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../.././././libaws-c-s3.so.0unstable (0x00007f3ffa64c000)
I install from pip. I don't see libaws-c-s3.so:
root@fba404d79f64:/dir# ldd /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_python.so.1000.1.0
linux-vdso.so.1 (0x00007ffe52b60000)
libarrow_dataset.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_dataset.so.1000 (0x00007f136944b000)
libparquet.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libparquet.so.1000 (0x00007f1368d10000)
libarrow.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow.so.1000 (0x00007f136663d000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1366456000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1366307000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f13662ec000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13660f8000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13660ee000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13660cb000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13660c5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f13697bd000)
I install from pip. I don't see libaws-c-s3.so:
Ah, I think, if you installed from pip, everything is statically linked. Which I suppose rules out a version incompatibility.
In that case I'm afraid I'm at a bit of a loss on where to proceed next. If it could be reproduced regularly we might try and build with a debug version of curl and break at the point where that error is being generated to figure out what exactly is invalid.
We're facing the same issue - we see AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 43
show up transiently, and it's pretty hard to reproduce - the only thing that seems to be triggering it more frequently is a higher-latency network connection to S3, so our suspicion is that something at a lower layer is not handling higher latency properly (cc @westonpace if you may have any pointers or additional debugging tips).
@vdytyniak-exos did you ever root cause this?
Describe the bug, including details regarding any error messages, version, and platform.
We use pyarrow to read data from S3 and sometimes we get the following error:
We were trying to find the reason why it happens, but it is very random. Can you help to understand where actually the problem with libcurl can be?
Component(s)
Python