ContinuumIO / libhdfs3-downstream

a native c/c++ hdfs client (downstream fork from apache-hawq)
Apache License 2.0
40 stars 54 forks source link

Google Protobuf Error in python connection #15

Open srams2018 opened 6 years ago

srams2018 commented 6 years ago

Hi, could someone help me on the below error. i have build the libhdfs3 both with protobuf 2.3 and 3.5 but getting same error.

Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org

import hdfs3 conf={"hadoop.security.authentication": "kerberos"} hdfs = hdfs3.HDFileSystem("server1.xyz.com", port=8020, pars=conf) Traceback (most recent call last): File "", line 1, in File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3-0.3.0-py2.7.egg/hdfs3/core.py", line 76, in init self.connect() File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3-0.3.0-py2.7.egg/hdfs3/core.py", line 94, in connect get_lib() File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3-0.3.0-py2.7.egg/hdfs3/core.py", line 657, in get_lib from .lib import _lib as l File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3-0.3.0-py2.7.egg/hdfs3/lib.py", line 15, in _lib = ct.cdll.LoadLibrary(name) File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/ctypes/init.py", line 440, in LoadLibrary return self._dlltype(name) File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/ctypes/init.py", line 362, in init self._handle = _dlopen(self._name, mode) OSError: /data1/apps/anaconda/4.3.1/libhdfs3-downstream-2.3.0/libhdfs3/dist/lib/libhdfs3.so: undefined symbol: _ZN6google8protobuf2io17CodedOutputStream36default_serialization_deterministic_E

martindurant commented 6 years ago

You would be best off getting your required versions from here; although when you build your own, obviously you are out in the wild! The specific error you are getting is not one that I have seen.

srams2018 commented 6 years ago

Thanks for the inputs, i will check on the versions now.

srams2018 commented 6 years ago

After rebuilding with all versions in meta.yml file , iam getting below error. also getting weird characters after user: connection.

python

Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org

import hdfs3 conf={"hadoop.security.authentication": "kerberos"} hdfs = hdfs3.HDFileSystem("server1.xyz.net", port=8020, pars=conf) 2018-05-03 13:59:39.024993, p29865, th140179328337664, ERROR Failed to invoke RPC call "getFsStats" on server "server1.xyz.net:8020": RpcChannel.cpp: 483: HdfsRpcException: Failed to invoke RPC call "getFsStats" on server "server1.xyz.net:8020" @ Hdfs::Internal::RpcChannelImpl::invokeInternal(std::shared_ptr) @ Hdfs::Internal::RpcChannelImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::getFsStats() @ Hdfs::Internal::NamenodeProxy::getFsStats() @ Hdfs::Internal::FileSystemImpl::getFsStats() @ Hdfs::Internal::FileSystemImpl::connect() @ Hdfs::FileSystem::connect(char const, char const, char const) @ hdfsBuilderConnect @ ffi_call_unix64 @ ffi_call @ _ctypes_callproc @ PyCFuncPtr_call @ PyObject_Call @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ function_call @ PyObject_Call @ instancemethod_call @ PyObject_Call @ slot_tp_init @ type_call @ PyObject_Call @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ PyEval_EvalCode @ PyRun_InteractiveOneFlags @ PyRun_InteractiveLoopFlags @ PyRun_AnyFileExFlags @ Py_Main @ __libc_start_main @ Unknown Caused by RpcChannel.cpp: 931: HdfsRpcServerException: org.apache.hadoop.security.authorize.AuthorizationException: User ��H�$t�H�D�]L�%Dr2 (auth:PROXY) via a_ndccdh@xyz.COM (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.hdfs.protocol.ClientProtocol: denied by configured ACL @ Hdfs::Internal::RpcChannelImpl::readOneResponse(bool) @ Hdfs::Internal::RpcChannelImpl::invokeInternal(std::shared_ptr) @ Hdfs::Internal::RpcChannelImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::invoke(Hdfs::Internal::RpcCall const&) @ Hdfs::Internal::NamenodeImpl::getFsStats() @ Hdfs::Internal::NamenodeProxy::getFsStats() @ Hdfs::Internal::FileSystemImpl::getFsStats() @ Hdfs::Internal::FileSystemImpl::connect() @ Hdfs::FileSystem::connect(char const, char const, char const) @ hdfsBuilderConnect @ ffi_call_unix64 @ ffi_call @ _ctypes_callproc @ PyCFuncPtr_call @ PyObject_Call @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ function_call @ PyObject_Call @ instancemethod_call @ PyObject_Call @ slot_tp_init @ type_call @ PyObject_Call @ PyEval_EvalFrameEx @ PyEval_EvalCodeEx @ PyEval_EvalCode @ PyRun_InteractiveOneFlags @ PyRun_InteractiveLoopFlags @ PyRun_AnyFileExFlags @ Py_Main @ __libc_start_main @ Unknown

2018-05-03 13:59:39.025253, p29865, th140179328337664, INFO Retry idempotent RPC call "getFsStats" on server "server1.xyz.net:8020" Traceback (most recent call last): File "", line 1, in File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3/core.py", line 76, in init self.connect() File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3/core.py", line 140, in connect msg = ensure_string(_lib.hdfsGetLastError()).split('\n')[0] File "/data1/apps/anaconda/4.3.1/2/lib/python2.7/site-packages/hdfs3/utils.py", line 129, in ensure_string return s.decode() UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 211: ordinal not in range(128)

ghost commented 6 years ago

I tested and got the same error. Irrespective of it, this should not use impersonation or proxy user. Not sure why its trying to do.

martindurant commented 6 years ago

I agree, but I don't know why you are in a different security setting than you should be. There may well be a bug in the picking of the user in the C++ base, these things are hard to track down. You may also consider using arrow's hdfs module, if you are accessing the data from a machine that is within the cluster, it tends to have fewer security issues.