apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.57k stars 3.54k forks source link

[C++] Bundled external project for openssl #26149

Open asfimport opened 4 years ago

asfimport commented 4 years ago

aws-sdk-cpp requires it and it seems easy enough to build. Arguably you should manage openssl as a system package but that's not always feasible for all of our packages, and the manylinux wheel scripts already build and bundle openssl.

Reporter: Neal Richardson / @nealrichardson

Note: This issue was originally created as ARROW-10138. Please see the migration documentation for further details.

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: That sounds frankly like a very bad idea for two reasons: 1) OpenSSL is security-critical and we are not equipped to ship upgrades properly currently 2) OpenSSL can be used by the application and/or different parts of Arrow (e.g. through gRPC, as a dynamic library), and then we can get symbol clashes or ABI clashes between different versions of OpenSSL

So my vote is strongly -1 on this unless the two points above are adressed. And I expect point 1) to be non-trivial to solve given our current community structure and versioning strategy.

cc @kou

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: All reasonable concerns @pitrou, but as you yourself point out, we already do build and bundle openssl in the Python wheels. I'm not proposing that we add bundled openssl to any binary packages we currently ship--I'm proposing that we add a build_openssl macro to our cmake so that if (1) you are building the C++ library with some configuration that requires openssl, (2) you don't have openssl on your system, and (3) you have ARROW_DEPENDENCY_SOURCE=AUTO, the build falls back to the bundled openssl build and doesn't fail.

As long as that is the purpose, IIUC I don't think your concern #2 is an issue: we won't have symbol clashes because we're only building openssl if openssl isn't found on the system. And if we aren't adding bundled openssl to any of our packages, we don't have upgrades to ship (concern #1). Moreover, if the externalproject always installs latest openssl, not a pinned version, then even a local Arrow build from source with bundled openssl is upgraded by just rebuilding Arrow.

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: My concern is that we don't want to encourage this. Naive users who are not aware of the issues may be hurt quite hard. Forcing people to build it externally themselves is a good way to enforce this.

asfimport commented 4 years ago

Kouhei Sutou / @kou: @nealrichardson What platforms do you expect to use this feature? R on CentOS?

I think that we should recommend that users install OpenSSL by package manager instead of building OpenSSL by this feature. If there is no package manager (what platform?), users will use this feature.

I also think that auto OpenSSL build by ARROW_DEPENDENCY_SOURCE=AUTO is dangerous as @pitrou said. How about reporting how to install OpenSSL by package manager such as dnf install -y openssl-devel on CentOS 8 instead of building OpenSSL automatically with ARROW_DEPENDENCY_SOURCE=AUTO?

asfimport commented 4 years ago

Wes McKinney / @wesm: It probably makes sense to handle bundled OpenSSL with different default options than the other dependencies – e.g. if it gets built by the Arrow build system it needs to be something that users really want it to do rather than something that was automatically triggered because openssl wasn't resolved on the system

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: It's ok if we don't want dependency source AUTO to build openssl (though that's not really AUTO anymore) but I would at least like the ability to direct cmake to build it bundled. ARROW_DEPENDENCY_SOURCE=BUNDLED or OPENSSL_SOURCE=BUNDLED should be sufficient for that, right?

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: Why is it necessary exactly? How many use cases are there that would benefit from it?

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: I'm working to enable S3 support in the R package in the 2.0 release. It's essential that it work on Linux. I am literally fighting with this problem right now, where I'm doing a bundled build anyway, have tried to install openssl on the system to work around this, and cmake is still failing to find it: https://github.com/apache/arrow/pull/8304#issuecomment-701675803. I'm already bundling every other dependency, so why must this be an exception?

We've worked hard to make our build system "just work" for non-Arrow developers without having to fuss around with package managers, and this adds an easily avoidable hurdle.

We're already doing this in the pyarrow wheel build scripts, just outside of cmake, so I'm struggling to see why that's ok but this isn't.

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: Are you linking for a static library version of OpenSSL? The package provides dynamic libraries: https://centos.pkgs.org/7/centos-x86_64/openssl-devel-1.0.2k-19.el7.x86_64.rpm.html

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: See also: https://centos.pkgs.org/7/centos-x86_64/openssl-static-1.0.2k-19.el7.x86_64.rpm.html

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: Good questions--exactly the kinds of questions I want to avoid having to ask our R users by offering a bundled solution.

asfimport commented 4 years ago

Uwe Korn / @xhochy: An interesting view on bundling / not bundling OpemSSL is to look at what happened with the psycopg2 Python package. They did stop building wheels as they contain OpenSSL and advised people to build themselves. As a workaround, they provided the psycopg2-binary package as a renaming.on PyPI to still provide wheels. Nowadays some packages, most prominently Airflow https://github.com/apache/airflow/blob/c74b3ac79a276d8e9fb2b7668762f027c12d0e30/setup.py#L361, depend on exactly this package as user frustration was so large with the one without wheels.

asfimport commented 4 years ago

Neal Richardson / @nealrichardson: I'll also note that the centos 7 system package version, 1.0.2k, is far from latest, and in fact reached end of life at the end of 2019: https://en.wikipedia.org/wiki/OpenSSL

So bundling latest openssl sounds like a more secure approach to me.

asfimport commented 4 years ago

Antoine Pitrou / @pitrou: If CentOS 7 is still maintained, then you can probably trust the packagers to backport desired security patches. If CentOS 7 is not maintained anymore, then this is a valid concern. That said, what's being opposed here is mainly to expose it in the CMake process.

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: Is there still a need for this?

asfimport commented 3 years ago

Neal Richardson / @nealrichardson: I still think it would be nice but it's not a priority.