apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.5k stars 3.53k forks source link

A libcurl function was given a bad argument #35365

Open vdytyniak-exos opened 1 year ago

vdytyniak-exos commented 1 year ago

Describe the bug, including details regarding any error messages, version, and platform.

We use pyarrow to read data from S3 and sometimes we get the following error:

File "/usr/local/lib/python3.10/dist-packages/{org}/store/storage.py", line 794, in _load_partition
    table = ds.dataset(
  File "pyarrow/_dataset.pyx", line 369, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2818, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: AWS Error NETWORK_CONNECTION during GetObject operation: curlCode: 43, A libcurl function was given a bad argument

We were trying to find the reason why it happens, but it is very random. Can you help to understand where actually the problem with libcurl can be?

Component(s)

Python

westonpace commented 1 year ago

We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.

What version of pyarrow are you using? What OS?

vdytyniak-exos commented 1 year ago

We don't use curl directly, only indirectly through aws-cpp-sdk. This is going to be hard to debug without some way to reliably reproduce.

What version of pyarrow are you using? What OS?

pyarrow=10.0.1 os: ubuntu:20.04

westonpace commented 1 year ago

I did some basic research on the error and didn't find much. The only thing I could see that might cause this is if there is an incompatibility between the S3 SDK and the curl versions (e.g. if the S3 SDK was developed / compiled against one version and linked / run with another version).

How are you obtaining pyarrow? Is it from conda, pip, or a build from source? Can you use ldd to check which library versions it is linking against? For example, I use conda so I run this:

(arrow-release-11) pace@pace-desktop:~$ ldd ~/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/libarrow_python.so.1000.1.0 
...
    libcurl.so.4 => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../../././libcurl.so.4 (0x00007f3ffa8ac000)
...
    libaws-c-s3.so.0unstable => /home/pace/miniconda3/envs/arrow-release-10/lib/python3.11/site-packages/pyarrow/../../.././././libaws-c-s3.so.0unstable (0x00007f3ffa64c000)
vdytyniak-exos commented 1 year ago

I install from pip. I don't see libaws-c-s3.so:

root@fba404d79f64:/dir# ldd /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_python.so.1000.1.0
    linux-vdso.so.1 (0x00007ffe52b60000)
    libarrow_dataset.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow_dataset.so.1000 (0x00007f136944b000)
    libparquet.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libparquet.so.1000 (0x00007f1368d10000)
    libarrow.so.1000 => /usr/local/lib/python3.10/dist-packages/pyarrow/libarrow.so.1000 (0x00007f136663d000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1366456000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1366307000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f13662ec000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13660f8000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13660ee000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13660cb000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13660c5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f13697bd000)
westonpace commented 1 year ago

I install from pip. I don't see libaws-c-s3.so:

Ah, I think, if you installed from pip, everything is statically linked. Which I suppose rules out a version incompatibility.

In that case I'm afraid I'm at a bit of a loss on where to proceed next. If it could be reproduced regularly we might try and build with a debug version of curl and break at the point where that error is being generated to figure out what exactly is invalid.

shomilj commented 11 months ago

We're facing the same issue - we see AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 43 show up transiently, and it's pretty hard to reproduce - the only thing that seems to be triggering it more frequently is a higher-latency network connection to S3, so our suspicion is that something at a lower layer is not handling higher latency properly (cc @westonpace if you may have any pointers or additional debugging tips).

@vdytyniak-exos did you ever root cause this?