apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[SUPPORT] Docker Demo: Failed to Connect to namenode #5280

Closed arunb2w closed 2 years ago

arunb2w commented 2 years ago

Facing below error while performing hudi docker demo

22/04/09 08:17:07 WARN ipc.Client: Failed to connect to server: namenode/172.24.0.6:8020: try once and fail.
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
    at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    at org.apache.hadoop.ipc.Client.call(Client.java:1345)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1437)
    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
    at org.apache.hadoop.fs.Globber.doGlob(Globber.java:269)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1686)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:317)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:380)
mkdir: Call From adhoc-1/172.24.0.11 to namenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
22/04/09 08:17:30 WARN ipc.Client: Failed to connect to server: namenode/172.24.0.6:8020: try once and fail.
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
    at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    at org.apache.hadoop.ipc.Client.call(Client.java:1345)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1437)
    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
    at org.apache.hadoop.fs.Globber.doGlob(Globber.java:269)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1686)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:317)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:380)
mkdir: Call From adhoc-1/172.24.0.11 to namenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
copyFromLocal: `/var/demo/.': No such file or directory: `hdfs://namenode:8020/var/demo'
Copying spark default config and setting up configs

docker ps output

╰─ docker ps       
CONTAINER ID   IMAGE                                                              COMMAND                  CREATED       STATUS                 PORTS                                                                                                                                                                                           NAMES
19e87256b38e   apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:latest    "entrypoint.sh /bin/…"   2 hours ago   Up 2 hours             0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp                                                                                           adhoc-2
4c93e072cc5d   apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:latest    "entrypoint.sh /bin/…"   2 hours ago   Up 2 hours             0-1024/tcp, 5000-5100/tcp, 7000-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:4040->4040/tcp                                                                             adhoc-1
5fc03594a511   apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:latest   "entrypoint.sh /bin/…"   2 hours ago   Up 2 hours             0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-8080/tcp, 8082-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:8081->8081/tcp                                                    spark-worker-1
bde1234aa587   apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:latest   "entrypoint.sh /bin/…"   2 hours ago   Up 2 hours             0-1024/tcp, 4040/tcp, 5000-5100/tcp, 6066/tcp, 7000-7076/tcp, 0.0.0.0:7077->7077/tcp, 7078-8079/tcp, 8081-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:8080->8080/tcp   sparkmaster
e2c6132395f9   apachehudi/hudi-hadoop_2.8.4-trinoworker_368:latest                "./scripts/trino.sh …"   2 hours ago   Up 2 hours             0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-8091/tcp, 8093-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:8092->8092/tcp                                                    trino-worker-1
b9f15382e25e   apachehudi/hudi-hadoop_2.8.4-trinocoordinator_368:latest           "./scripts/trino.sh …"   2 hours ago   Up 2 hours             0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-8090/tcp, 8092-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:8091->8091/tcp                                                    trino-coordinator-1
5ce4159c3a8e   apachehudi/hudi-hadoop_2.8.4-datanode:latest                       "/bin/bash /entrypoi…"   2 hours ago   Up 2 hours (healthy)   0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-10100/tcp, 50000-50009/tcp, 0.0.0.0:50010->50010/tcp, 50011-50074/tcp, 50076-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:50075->50075/tcp     datanode1
f5236676e754   apachehudi/hudi-hadoop_2.8.4-history:latest                        "/bin/bash /entrypoi…"   2 hours ago   Up 2 hours (healthy)   0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-8187/tcp, 8189-10100/tcp, 50000-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:58188->8188/tcp                                                   historyserver
d27e2087636c   apachehudi/hudi-hadoop_2.8.4-namenode:latest                       "/bin/bash /entrypoi…"   2 hours ago   Up 2 hours (healthy)   0-1024/tcp, 4040/tcp, 5000-5100/tcp, 7000-8019/tcp, 8021-10100/tcp, 0.0.0.0:8020->8020/tcp, 50000-50069/tcp, 50071-50200/tcp, 58042/tcp, 58088/tcp, 58188/tcp, 0.0.0.0:50070->50070/tcp         namenode
8ec736d28ba0   bitnami/kafka:2.0.0                                                "/app-entrypoint.sh …"   2 hours ago   Up 2 hours             0.0.0.0:9092->9092/tcp                                                                                                                                                                          kafkabroker
2b18623cb771   graphiteapp/graphite-statsd                                        "/entrypoint"            2 hours ago   Up 2 hours             0.0.0.0:80->80/tcp, 2013-2014/tcp, 2023-2024/tcp, 8080/tcp, 0.0.0.0:2003-2004->2003-2004/tcp, 0.0.0.0:8126->8126/tcp, 8125/tcp, 8125/udp                                                        graphite
5975817234ba   bitnami/zookeeper:3.4.12-r68  
arunb2w commented 2 years ago

I see from this ticket https://github.com/apache/hudi/issues/1483 that when running local instances of hadoop may cause this error. But, I dont have any local instances running in my machine. Please help how to proceed further

codope commented 2 years ago

@arunb2w Can you please share the docker resources?

  1. How many CPUs was allocated?
  2. How much memory was allocated?
  3. How about remaining disk image capacity?

I have come across this error once when I was running out of allocated image size. Can you please do a clean docker setup? I am able to run the setup. For reference, my current docker resources:

  1. 6 CPUs
  2. 7.5GB memory (1.5GB swap space)
  3. 59.6 GB disk image size, out of which 22GB is still available.
arunb2w commented 2 years ago

@codope Attaching my docker resources for your reference

Screenshot 2022-04-27 at 12 26 24 PM
arunb2w commented 2 years ago

I have also tried with the below config as well but still the same error persists. I have also uninstalled and reinstalled docker in my mac before running this I am running docker demo in Apple M1 chip. Are there any issues related to that?

Screenshot 2022-04-27 at 2 35 49 PM
yihua commented 2 years ago

@arunb2w The docker demo is not fully supported on Apple M1 chip yet, which is arm64 architecture. That breaks some functionality of Hadoop and Hive in the demo. We have a Jira ticket to track it: HUDI-2786

arunb2w commented 2 years ago

okay thanks. can you please suggest are there any alternatives to run hudi in local for development purpose.

yihua commented 2 years ago

@arunb2w If you don't need Hadoop or Hive specifically, you can compile Hudi jars and use Spark to write Hudi table to the local file system. Spark should run on any platform that runs Java 8. You can download the Spark release and directly use spark-submit, spark-shell, or spark-sql for writing and reading Hudi table, following our Spark Guide.

xushiyan commented 2 years ago

@yihua the issue can be solved by using hadoop and hive images built for arm64, correct? we should aim to fix this with arm64 docker images. I'll tag the ticket for 0.12

xushiyan commented 2 years ago

tracking the work and fix in https://issues.apache.org/jira/browse/HUDI-3601

yihua commented 2 years ago

@xushiyan We've built the images for arm64. I remember some native libraries from Hadoop or Hive are not compatible on arm64 arch so at runtime the errors are thrown. It depends on those OSS projects to fully support arm64 before we can make the docker demo fully working on arm64.

codope commented 2 years ago

This is again a duplicate of #4985 The work here is blocked on arm64 support for dependent OSS projects. We are tracking the issue closely in HUDI-3601 and expect to make it work in Hudi 0.13.0.