Closed wgtmac closed 2 months ago
cc @dongjoon-hyun @stiga-huang @ffacs
Also cc @omalley in case there is any blocker to this.
cc @williamhyun , @pavibhai , @deshanxiao , too.
Does it mean that we cannot upgrade libprotobuf like https://github.com/apache/orc/pull/1857 in 2.x? Or at least we can choose appropriate protobuf version according to the BUILD_LIBHDFSPP
option. @dongjoon-hyun
+1 for choosing the protobuf version based on the BUILD_LIBHDFSPP
option. It might have less complaints.
I'm not sure what you mean by that. May I ask what is broken in libprotobuf
PR, #1857, specifically , @wgtmac ?
Does it mean that we cannot upgrade libprotobuf like #1857 in 2.x? Or at least we can choose appropriate protobuf version according to the
BUILD_LIBHDFSPP
option.
Apache ORC 2 has two-layer protections.
One is orc-format
which exists independently from this repo. apache/orc
repository and PRs cannot break any ORC file format structure because orc_proto.proto
doesn't belong to this repo. It exists in apache/orc-format
without any change.
The other is that ProtocolBuffer
. ProtocolBuffer is a layer to provide a language and platform neutral binary compatibility, isn't it? #1857 only changes from 3.5.1 to 3.21.12. The major version is the same. Do you mean they break anything there?
FYI, Semantic Versioning
is defined here. You can do whatever you want as long as you follow the policy. In this PR, my concern is the part of a backward compatible manner
of this policy. In general, Removal
is not an option for backward compatibility in general. That's the reason why I told you that we can do only Deprecatation. Deprecation doesn't mean Removal
.
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backward compatible manner PATCH version when you make backward compatible bug fixes
https://github.com/apache/orc/pull/1857 only changes from 3.5.1 to 3.21.12. The major version is the same. Do you mean they break anything there?
Yes, libhdfspp does not link with 3.21.12. For example:
[ 84%] Linking CXX executable orc-test
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::GetFsStatusRequestProto::~GetFsStatusRequestProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:9833: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::DisallowSnapshotResponseProto::~DisallowSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23777: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::AllowSnapshotResponseProto::~AllowSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23487: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::RenameSnapshotResponseProto::~RenameSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23197: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::DeleteSnapshotResponseProto::~DeleteSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:24090: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o):/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:4055: more undefined references to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()' follow
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(ClientNamenodeProtocol.pb.cc.o):
Although we don't want to touch that file, I guess we are able to update the source code like we did in the following PR. WDTY?
Right, I feel very reluctant to touch that file. I'll look into it later.
Thanks @dongjoon-hyun!
What changes were proposed in this pull request?
Mark readHdfsFile as deprecated.
Why are the changes needed?
Reading ORC on HDFS was introduced in https://github.com/apache/orc/pull/134 without any test. It has not been updated for 7 years and updating libhdfspp will result in extra dependency like boost library. Staying at an old version of libhdfspp will also prohibit us from updating other libraries like protobuf.
How was this patch tested?
It does not need test.
Was this patch authored or co-authored using generative AI tooling?
No.