apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[SUPPORT] [HUDI-3601] The current Docker demo only for x86 system ,arm64 system can't run success #4985

Closed Aalron closed 2 years ago

Aalron commented 2 years ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. clone 0.10.1 release
  2. run step then can't sucess

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

yanghua commented 2 years ago

@nbalajee @n3nash Can you help to support this?

xushiyan commented 2 years ago

@codope and I are working on this.

codope commented 2 years ago

@Aalron @xushiyan I have built images for arm64 and pushed to our docker hub. In the docker setup, before running setup_demo.sh script please apply this patch in <HUDI_REPO>/docker/compose/docker-compose_hadoop284_hive233_spark244.yml file: https://gist.github.com/codope/3dd986de5e54f0650dd74b6032e4456c

Please note that this is still experimental. I have not fully tested the docker demo on an arm64 machine.

xushiyan commented 2 years ago

As discussed with @codope , we use a different tag now for arm64 and the required dockerhub images were updated with linux-arm64-0.10.1 tag

docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push

@Aalron Uploaded this patch to be used with 0.10.1 to get docker running on arm64
https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2

Aalron commented 2 years ago

@xushiyan @codope now ,I have encountered a new problem. When executing the setup_demo.sh command, I went to the docker console and found that "menorah84/hive-metastore-postgresql:2.3.0" reported an error at startup. The specific error is as follows: " Error: Database is uninitialized and superuser password is not specified. You must specify POSTGRES_PASSWORD to a non-empty value for the superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run". You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all connections without a password. This is not recommended. See PostgreSQL documentation about "trust": https://www.postgresql.org/docs/current/auth-trust.html ”, and then this error caused other dependencies that depended on it to fail to start successfully, so please help, how to solve this problem?

xushiyan commented 2 years ago

haven't tried myself but would this work? @Aalron

  hive-metastore-postgresql:
    image: menorah84/hive-metastore-postgresql:2.3.0
    platform: linux/arm64
    environment:
      - POSTGRES_HOST_AUTH_METHOD=trust
    volumes:
      - hive-metastore-postgresql:/var/lib/postgresql
    hostname: hive-metastore-postgresql
    container_name: hive-metastore-postgresql
Aalron commented 2 years ago

@xushiyan According to Mr. Codope's method, the docker images have been successfully pulled so far, but when the step_demo.sh command is executed to start each container, the new problem as described above will appear.

xushiyan commented 2 years ago

haven't tried myself but would this work? @Aalron


  hive-metastore-postgresql:

    image: menorah84/hive-metastore-postgresql:2.3.0

    platform: linux/arm64

    environment:

      - POSTGRES_HOST_AUTH_METHOD=trust

    volumes:

      - hive-metastore-postgresql:/var/lib/postgresql

    hostname: hive-metastore-postgresql

    container_name: hive-metastore-postgresql

Understood @Aalron . Here I'm suggesting putting this env car in the docker compose file, as shown above. See "environment". Does this work?

Aalron commented 2 years ago

@xushiyan Thanks ,According to your method, Now this problem should have been solved, I don't report an error locally, I will see if I can run the project at night.

Aalron commented 2 years ago

@xushiyan @codope i found kafka setting in <HUDI_REPO>/docker/compose/docker-compose_hadoop284_hive233_spark244.yml need add an environment setting

   kafka:
    image: 'wurstmeister/kafka:2.12-2.0.1'
    platform: linux/arm64
    hostname: kafkabroker
    container_name: kafkabroker
    ports:
      - '9092:9092'
    environment:
      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_ADVERTISED_HOST_NAME=kafkabroker

after that, i found four exception

First from historyserver( apachehudi/hudi-hadoop_2.8.4-history:latest) images:

22/03/10 10:01:38 FATAL applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2530759744317816554.8: /tmp/libleveldbjni-64-1-2530759744317816554.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:115)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:180)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:190)

Second : presto-coordinator-1(apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:latest) image

Presto requires amd64 or ppc64le on Linux (found aarch64)

Third : presto-worker-1(apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:latest) image

Presto requires amd64 or ppc64le on Linux (found aarch64)

Fourth spark-worker-1(apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:latest) image

22/03/10 10:12:06 WARN worker.Worker: Failed to connect to master sparkmaster:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:253)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to sparkmaster/172.18.0.10:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sparkmaster/172.18.0.10:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
xushiyan commented 2 years ago

As discussed with @codope , we use a different tag now for arm64 and the required dockerhub images were updated with linux-arm64-0.10.1 tag

docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push

@Aalron Uploaded this patch to be used with 0.10.1 to get docker running on arm64 https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2

@Aalron I mentioned here already, we updated the tag for arm64 images. you should be using this tag instead of latest. I also linked a patch to illustrate the diff. Can you use the patch in your local setup? the "latest" images were reverted to previous amd64 platform which won't work for your computer.

Aalron commented 2 years ago

@xushiyan According to your method, there are currently three questions left, First, about [apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1] :

22/03/11 02:43:05 INFO applicationhistoryservice.ApplicationHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT]
22/03/11 02:43:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/11 02:43:05 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
22/03/11 02:43:05 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
22/03/11 02:43:05 INFO impl.MetricsSystemImpl: ApplicationHistoryServer metrics system started
22/03/11 02:43:05 FATAL applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2100680455800525123.8: /tmp/libleveldbjni-64-1-2100680455800525123.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:115)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:180)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:190)

Second, about [apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1] :

Presto requires amd64 or ppc64le on Linux (found aarch64)

Third, about [apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1] :

Presto requires amd64 or ppc64le on Linux (found aarch64)
xushiyan commented 2 years ago

tracking the work and fix in https://issues.apache.org/jira/browse/HUDI-3601

pratyakshsharma commented 2 years ago

going over the discussion here, it looks like the support for M1 chip is not yet available for Hudi's docker demo. I applied this patch - https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2 and ran setup_demo.sh. I got the error mentioned in this jira - https://issues.apache.org/jira/browse/HUDI-2786.

Please suggest if I am missing anything @xushiyan cc @codope

Mike-Roberts-2112 commented 2 years ago

I agree with the comment above. @xushiyan and @codope this is a very needed feature.

xushiyan commented 2 years ago

@Mike-Roberts-2112 @Aalron understood. from last investigation, it looks like some dependent service needs to support amd64 first. We'll prioritize this accordingly.

codope commented 2 years ago

Going to close this issue. We are tracking the support in HUDI-2786. ETA: Hudi version 0.13.0 (expected to release in early November).