Mellanox / R4H

RDMA for HDFS
26 stars 9 forks source link

fail to run R4H for HDFS 2.7.4 #2

Closed daixiang0 closed 3 years ago

daixiang0 commented 6 years ago

I follow doc to install R4H but failed:

2018-10-19 15:37:56,927 INFO com.mellanox.r4h.DataXceiverServer: Creating DataXceiverServer - uri=rdma://test-storage-0.localdomain:50010
2018-10-19 15:37:56,932 ERROR LogFromNative: ../common/xio_server.c:332:xio_bind() connection listen failed

2018-10-19 15:37:56,932 ERROR LogFromNative: Bridge.cc:323:Java_org_accelio_jxio_impl_Bridge_startServerPortalNative() failure on new ServerPortal(ctx=0x7f7c112f4c10, url=rdma:/
/test-storage-0.localdomain:50010)

There is no update for 2.7.4 HDFS, is it work for 2.7 or above?

daixiang0 commented 6 years ago

update log:

2018-10-29 20:43:06,370 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 45271
2018-10-29 20:43:06,370 INFO org.mortbay.log: jetty-6.1.26
2018-10-29 20:43:06,560 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45271
2018-10-29 20:43:06,670 INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:50075
2018-10-29 20:43:06,823 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs
2018-10-29 20:43:06,823 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
2018-10-29 20:43:06,864 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 1000
2018-10-29 20:43:06,881 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020
2018-10-29 20:43:07,007 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:50020
2018-10-29 20:43:07,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2018-10-29 20:43:07,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2018-10-29 20:43:07,048 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to /10.0.30.111:8020 starting to offer service
2018-10-29 20:43:07,055 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2018-10-29 20:43:07,055 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2018-10-29 20:43:07,141 INFO com.mellanox.r4h.DataXceiverServer: Creating DataXceiverServer - uri=rdma://test-storage-0.localdomain:50010
2018-10-29 20:43:07,146 ERROR LogFromNative: ../common/xio_server.c:332:xio_bind() connection listen failed

2018-10-29 20:43:07,147 ERROR LogFromNative: Bridge.cc:323:Java_org_accelio_jxio_impl_Bridge_startServerPortalNative() failure on new ServerPortal(ctx=0x7fbfe5357110, url=rdma://test-storage-0.localdomain:50010)

2018-10-29 20:43:07,147 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: ServicePlugin com.mellanox.r4h.R4HDatanodePlugin@6bb75258 is not started yet could not be started
java.lang.NullPointerException
        at org.accelio.jxio.ServerPortal.<init>(ServerPortal.java:140)
        at com.mellanox.r4h.DataXceiverServer.<init>(DataXceiverServer.java:162)
        at com.mellanox.r4h.R4HDatanodePlugin.start(R4HDatanodePlugin.java:66)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startPlugins(DataNode.java:782)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:2247)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2342)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2522)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2546)
2018-10-29 20:43:07,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 4 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=4, dataDirs=4)
2018-10-29 20:43:07,414 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /hdfs/hdfs0/dn/in_use.lock acquired by nodename 1262315@test-storage-0.localdomain
daixiang0 commented 6 years ago

@elad-via @ilansmith any update?

liranoz12 commented 6 years ago

@daixiang0,

This repository is deprecated. Sorry for the inconvenience.

Thanks, Liran

daixiang0 commented 6 years ago

@liranoz12 thanks for your reply. Do you know where can i get the newest version of R4H? Or how can i use RDMA for hadoop above 2.7?

liranoz12 commented 6 years ago

@daixiang0, There is no other R4H version - this project is deprecated. As far as i remember, R4H supports hdp230 which based on hadoop 2.7 : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_HDP_RelNotes/content/ch_relnotes_v230.html

Thanks, Liran

daixiang0 commented 6 years ago

@liranoz12 Thanks very much. It seems that do not support version above 2.7.2. I will find another way to use RDMA for hadoop.

XianlaiShen commented 5 years ago

@daixiang0, Do you find another way to use RDMA for hadoop? I want to use RDMA in hadoop-2.7.4. Thanks!

daixiang0 commented 5 years ago

@XianlaiShen use IBtoIP mode