camsas / firmament

The Firmament cluster scheduling platform
Apache License 2.0
412 stars 77 forks source link

HdfsNetworkConnectException: Connect to "localhost:8020" failed #46

Closed cxxly closed 8 years ago

cxxly commented 8 years ago

I deploy firmament according to "Getting started" tutorial. When I start firmament, I come accross follow errors. I don't familiar with hdfs, so forgive me if I miss some steps.

cxxly@ubuntu:~/firmament$ sudo build/src/coordinator --listen_uri tcp:133.133.134.130:9001 --task_lib_dir=$(pwd)/build/src/
2016-06-26 07:06:06.825858, p7895, th139744479869056, ERROR Failed to setup RPC connection to "localhost:8020" caused by:
TcpSocket.cpp: 293: HdfsNetworkConnectException: Connect to "localhost:8020" failed: (errno: 111) Connection refused
    @   Hdfs::Internal::TcpSocketImpl::connect(addrinfo*, char const*, char const*, int)
    @   Hdfs::Internal::TcpSocketImpl::connect(char const*, char const*, int)
    @   Hdfs::Internal::RpcChannelImpl::connect()
    @   Hdfs::Internal::RpcChannelImpl::invokeInternal(std::shared_ptr<Hdfs::Internal::RpcRemoteCall>)
    @   Hdfs::Internal::RpcChannelImpl::invoke(Hdfs::Internal::RpcCall const&)
    @   Hdfs::Internal::NamenodeImpl::invoke(Hdfs::Internal::RpcCall const&)
    @   Hdfs::Internal::NamenodeImpl::getFsStats()
    @   Hdfs::Internal::NamenodeProxy::getFsStats()
    @   Hdfs::Internal::FileSystemImpl::getFsStats()
    @   Hdfs::Internal::FileSystemImpl::connect()
    @   Hdfs::FileSystem::connect(char const*, char const*, char const*)
    @   hdfsBuilderConnect
    @   firmament::store::HdfsDataLocalityManager::HdfsDataLocalityManager(firmament::TraceGenerator*)
    @   firmament::Coordinator::Coordinator()
    @   main
    @   Unknown
    @   Unknown
ms705 commented 8 years ago

Hi @cxxly,

Ah, this happens because Firmament cannot find the HDFS name node (which it uses for data-local scheduling). By default, it assumes that the name node is at localhost:8020, and this can be customized via command line flags. In your case, however, you probably want to turn the HDFS integration off.

To work around the immediate issue, you can rebuild and pass --DHDFS_ENABLE=off to the cmake invocation. In the medium term, we'll push a patch that (i) disables HDFS integration by default, and (ii) fails gracefully when the NameNode cannot be reached.

ICGog commented 8 years ago

I've pushed a patch 282024 that fixes your issue.

ms705 commented 8 years ago

I think we should also change the HdfsDataLocalityManager to print an error and fail gracefully if the HDFS NameNode cannot be reached, so that one can compile with HDFS support enabled.

I'll take a look.

ms705 commented 8 years ago

Patch now under review in 282133.

cxxly commented 8 years ago

Look Good to Me :+1: