apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.67k stars 3.56k forks source link

[C++] Vendor googletest #43422

Open bkietz opened 4 months ago

bkietz commented 4 months ago

Describe the enhancement requested

Provisioning googletest/GTest for c++ testing has been the source of many CI failures such as https://github.com/apache/arrow/issues/43400 . Vendoring the whole library would give us greater control over what version is used and I don't think it would impact us much.

Since the version of GTest which we vendor will change seldom, libgtest.a should mostly be ccache'd so we won't be rebuilding it in every CI job.

GTest has an acceptable license for vendoring (3 clause BSD), documents vendoring as a simple option for projects which can do so, and we don't install anything which is linked to it anyway. I don't think it's a priority for us to support a wide set of libgtest.a versions in the same way that we try to support as many zlibs as possible

Component(s)

C++

bkietz commented 4 months ago

@kou @pitrou

kou commented 4 months ago

I think that we can fix this problem without vendoring GoogleTest. How about ensuring using bundled GoogleTest's header files like GH-43465?

bkietz commented 4 months ago

I think that we can fix this problem without vendoring GoogleTest.

Definitely.

The intent of this issue was more to ask the question of whether we want to keep fixing these problems. Is there a compelling reason to continue supporting a user-configurable version of GTest?

kou commented 4 months ago

The intent of this issue was more to ask the question of whether we want to keep fixing these problems.

Hmm. I think that GH-43465 is the real fix of the mixing system GoogleTest and bundled GoogleTest problem. So we don't need to fix these problems anymore.

Is there a compelling reason to continue supporting a user-configurable version of GTest?

I don't object this proposal strongly but here are some reasons:

  1. libarrow_testing.so depends on GoogleTest. If we always depend on our vendored GoogleTest, users can't mix system GoogleTest and libarrow_testing.so.
  2. In general, packaging systems such as .deb and .rpm don't like vendoring.
  3. I want to use system libraries as much as possible to avoid building them entirely. (ccache can reduce build time as you mentioned.)
bkietz commented 4 months ago

libarrow_testing.so depends on GoogleTest. If we always depend on our vendored GoogleTest, users can't mix system GoogleTest and libarrow_testing.so

That's a good point; I had forgotten arrow_testing and it invalidates "we don't install anything which is linked to it anyway" in the issue description. arrow_testing is even documented as installed. I'm not sure who uses it outside the arrow repository (maybe nobody does and it is only installed because add_arrow_lib doesn't have a DONT_INSTALL option), but in order to vendor GTest we'd need to deprecate installing that target first.

kou commented 4 months ago

Random Arrow Array generators in arrow_testing will be useful. If we can move the feature to compute or dataset (or something), we may be able to deprecated arrow_testing.