apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.13k stars 3.44k forks source link

[C++] Avoid double initialisation / double finalize of Aws SDK #40262

Open TechnophobicLampshade opened 5 months ago

TechnophobicLampshade commented 5 months ago

Describe the usage question you have. Please include as many useful details as possible.

I have pre-existing code that uses the C++ AWS SDK and calls Aws::InitAPI and Aws::ShutdownAPI. Now I am using arrow with s3, and it seems to need me to call arrow's arrow::fs::InitializeS3() before it will allow me to use the S3FileSystem. I would prefer to handle SDK initialisation myself, but I don't see a way to tell arrow that "I've already initialised the SDK, don't do it yourself". I would prefer not to add Arrow awareness to all of my existing code.

Is there a known workaround for this please?

Component(s)

C++

pitrou commented 5 months ago

Is there a known workaround for this please?

Currently, no, but that's a reasonable feature request. Perhaps by adding APIs such as:

void MarkS3Initialized();
void MarkS3Finalized();

What do you think?

TechnophobicLampshade commented 5 months ago

Thanks @pitrou. That API makes sense to me.

TechnophobicLampshade commented 5 months ago

It seems that in more recent versions of the aws sdk than I was using, there is code that ignores duplicate Init/Shutdown calls: https://github.com/aws/aws-sdk-cpp/blob/5929e202e9b1cd84d6234aa01b64f56fd2208350/src/aws-cpp-sdk-core/source/Aws.cpp#L188

Perhaps this avoids the need for this altogether.

pitrou commented 5 months ago

Perhaps this avoids the need for this altogether.

Can you take a look and see whether it's good enough for you?

johnkerl commented 1 month ago

We have tested wheels with the referenced AWS SDK patch, in 16.0 and 17.0-rc wheels (see #42154), and we believe this issue still exists.