apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
671 stars 477 forks source link

ORC-1669: [C++] Upgrade libhdfs #1864

Open ffacs opened 3 months ago

ffacs commented 3 months ago

What changes were proposed in this pull request?

Upgrade libhdfs.

Why are the changes needed?

To https://github.com/apache/orc/pull/1857

How was this patch tested?

UT passed

Was this patch authored or co-authored using generative AI tooling?

No

wgtmac commented 3 months ago
CMake Error at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message):
  Could NOT find Boost (missing: Boost_INCLUDE_DIR date_time) (Required is at
  least version "1.72.0")

It seems that the latest version of libhdfspp has a dependency on boost, which we don't want to add to our dependencies.

The ORC C++ library usually is integrated into larger systems which have built-in filesystem implementation and simply adapt orc::InputStream to their parities. As libhdfspp has been introduced 7 years ago and has not been actively maintained, I'd propose to remove its support. @dongjoon-hyun @stiga-huang WDYT?

dongjoon-hyun commented 3 months ago

It's too bad. Apache ORC community cannot remove it at ORC 2.x because ORC 2.0.0 is released already and we follow Semantic Versioning policy. The best thing we can do is to deprecate it at Apache ORC 2.0.1.

wgtmac commented 3 months ago

That sounds good. Let me mark relevant code as deprecated first.