apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.87k stars 3.38k forks source link

[Python] AWS SDK 1.11 support in pyarrow wheels? #42154

Open johnkerl opened 2 weeks ago

johnkerl commented 2 weeks ago

Describe the bug, including details regarding any error messages, version, and platform.

With regard to https://github.com/apache/arrow/issues/40262, is there a plan to update pyarrow's AWS SDK dependency from 1.10 to 1.11? We believe from https://github.com/apache/arrow/blob/fe4d04f081e55ca2de7b1b67b10ad7dca96cfd9e/cpp/thirdparty/versions.txt#L54 that pyarrow is currently using 1.10:

ARROW_AWSSDK_BUILD_VERSION=1.10.55

It appears that a mitigation for https://github.com/apache/arrow/issues/40262 is in AWS SDK 1.11: https://github.com/aws/aws-sdk-cpp/pull/2710

(There's significant backstory on https://github.com/single-cell-data/TileDB-SOMA/pull/2692 and on https://github.com/TileDB-Inc/tiledbsoma-feedstock/pull/171, if backstory is desired. A repro is here: https://github.com/single-cell-data/TileDB-SOMA/pull/2692#issuecomment-2153450984.)

cc @pitrou

Component(s)

Python

johnkerl commented 2 weeks ago

Also cc @ihnorton @ivirshup @h-vetinari

h-vetinari commented 2 weeks ago

Conda-forge has been building against aws 1.11 for a long time already, and this also got synched back to the conda tests in arrow itself (which have bitrotted in the meantime, but there are efforts to revive them): https://github.com/apache/arrow/blob/801de2fbcf5bcbce0c019ed4b35ff3fc863b141b/dev/tasks/conda-recipes/.ci_support/linux_64_cuda_compiler_version11.2.yaml#L3-L4

In any case, we run the full test suite on the python side (not the C++ side yet, c.f. https://github.com/apache/arrow/issues/35587), in every feedstock build, and it passes on osx. So I don't see the immediate incompatibility, which I assume is restricted to some corner cases.

You should provide a stacktrace (or ideally, a reproducer) of what fails.

PS. In the past there was once something that kept arrow stuck on aws 1.8 for a long time (which might help for context): https://github.com/aws/aws-sdk-cpp/issues/1809

h-vetinari commented 2 weeks ago

Looks like https://github.com/TileDB-Inc/TileDB-Py/issues/1990 is relevant, but again, you should really provide an example where arrow crashes or does something wrong, not another downstream project. The fact that the import order seems to matter is already ground for suspecting that there's something else going on here.

ihnorton commented 2 weeks ago

The specific question here is if/when will arrow wheels update to AWS SDK 1.11? The reason for the question is to understand whether the mitigation for the issue described below will be available "soon", or we need to work around it (rename symbols, further patch the AWS SDK, etc?).

For more background on the issue:

Summarizing: AWS has released a mitigation for the abort, implemented here: https://github.com/aws/aws-sdk-cpp/pull/2710. The mitigation is available in AWS SDK 1.11. TileDB wheels have updated to AWS SDK 1.11, but AFAICT all packages need to be updated for the mitigation to work.

This issue will likely impact any other library that bundles the AWS SDK in a wheel and is loaded at the same time as pyarrow.

kou commented 2 weeks ago

PyArrow wheels don't use bundled AWS SDK for C++. It uses vcpkg's one:

https://github.com/ursacomputing/crossbow/actions/runs/9544500563/job/26303310398#step:7:559

-- Found AWS SDK for C++, Version: 1.11.201, Install Root:/opt/vcpkg/installed/amd64-linux-static-release, Platform Prefix:, Platform Dependent Libraries: pthread;crypto;ssl;z;curl
jorisvandenbossche commented 1 week ago

The VCPKG version is currently pinned at 2023.11.20:

https://github.com/apache/arrow/blob/eec6f17c8879b469dc3370dad4a7f68f11705a6b/.env#L92

(last updated in https://github.com/apache/arrow/pull/39622)

This could certainly use another update to a more recent vcpkg state (EDIT: this is currently being done in https://github.com/apache/arrow/pull/42171), but so that release (as @kou also showed from the logs) already included AWS SDK 1.11 (https://github.com/microsoft/vcpkg/releases/tag/2023.11.20, it updated it from 1.11.169#2 to 1.11.201)

jorisvandenbossche commented 1 week ago

FWIW, this also means that the latest pyarrow wheels for 16.0.0 should actually already include AWS SDK 1.11

When pyarrow and tiledb-py are installed from PyPI, and imported in the same process, making S3 requests (which happens via the AWS SDK) causes an abort on some platforms.

@ihnorton the crashes you see, is that with the latest pyarrow release from PyPI?

ihnorton commented 1 week ago

@ihnorton the crashes you see, is that with the latest pyarrow release from PyPI?

Yes:

pyarrow                   16.1.0                   pypi_0    pypi
python                    3.12.4          h30c5eda_0_cpython    conda-forge
readline                  8.2                  h92ec313_1    conda-forge
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
tiledb                    0.30.0                   pypi_0    pypi
ihnorton commented 1 week ago

This could certainly use another update to a more recent vcpkg state already included AWS SDK 1.11 (https://github.com/microsoft/vcpkg/releases/tag/2023.11.20, it updated it from 1.11.169#2 to 1.11.201)

@jorisvandenbossche thanks for the explanation. It looks like the commit I referenced didn't actually make it in to the SDK until 1.11.179: https://github.com/aws/aws-sdk-cpp/commit/1f49f91a97cdc6556c0441010662a3647a3e1480

(EDIT: this is currently being done in https://github.com/apache/arrow/pull/42171), but so that release (as @kou also showed from the logs)

Thanks for the pointer! We'll sit tight and try this again after wheels are released with that update. Much appreciated.

jorisvandenbossche commented 1 week ago

It looks like the commit I referenced didn't actually make it in to the SDK until 1.11.179: aws/aws-sdk-cpp@1f49f91

That should still mean this is included in the pyarrow 16.0.0 wheels, AFAIK (because it should have used 1.11.201)