Closed Aalron closed 2 years ago
@nbalajee @n3nash Can you help to support this?
@codope and I are working on this.
@Aalron @xushiyan I have built images for arm64 and pushed to our docker hub. In the docker setup, before running setup_demo.sh
script please apply this patch in <HUDI_REPO>/docker/compose/docker-compose_hadoop284_hive233_spark244.yml
file: https://gist.github.com/codope/3dd986de5e54f0650dd74b6032e4456c
Please note that this is still experimental. I have not fully tested the docker demo on an arm64 machine.
As discussed with @codope , we use a different tag now for arm64 and the required dockerhub images were updated with linux-arm64-0.10.1
tag
docker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push
docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push
@Aalron
Uploaded this patch to be used with 0.10.1 to get docker running on arm64
https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2
@xushiyan @codope now ,I have encountered a new problem. When executing the setup_demo.sh command, I went to the docker console and found that "menorah84/hive-metastore-postgresql:2.3.0" reported an error at startup. The specific error is as follows: " Error: Database is uninitialized and superuser password is not specified. You must specify POSTGRES_PASSWORD to a non-empty value for the superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run". You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all connections without a password. This is not recommended. See PostgreSQL documentation about "trust": https://www.postgresql.org/docs/current/auth-trust.html ”, and then this error caused other dependencies that depended on it to fail to start successfully, so please help, how to solve this problem?
haven't tried myself but would this work? @Aalron
hive-metastore-postgresql:
image: menorah84/hive-metastore-postgresql:2.3.0
platform: linux/arm64
environment:
- POSTGRES_HOST_AUTH_METHOD=trust
volumes:
- hive-metastore-postgresql:/var/lib/postgresql
hostname: hive-metastore-postgresql
container_name: hive-metastore-postgresql
@xushiyan According to Mr. Codope's method, the docker images have been successfully pulled so far, but when the step_demo.sh command is executed to start each container, the new problem as described above will appear.
haven't tried myself but would this work? @Aalron
hive-metastore-postgresql: image: menorah84/hive-metastore-postgresql:2.3.0 platform: linux/arm64 environment: - POSTGRES_HOST_AUTH_METHOD=trust volumes: - hive-metastore-postgresql:/var/lib/postgresql hostname: hive-metastore-postgresql container_name: hive-metastore-postgresql
Understood @Aalron . Here I'm suggesting putting this env car in the docker compose file, as shown above. See "environment". Does this work?
@xushiyan Thanks ,According to your method, Now this problem should have been solved, I don't report an error locally, I will see if I can run the project at night.
@xushiyan @codope
i found kafka setting in <HUDI_REPO>/docker/compose/docker-compose_hadoop284_hive233_spark244.yml
need add
an environment setting
kafka:
image: 'wurstmeister/kafka:2.12-2.0.1'
platform: linux/arm64
hostname: kafkabroker
container_name: kafkabroker
ports:
- '9092:9092'
environment:
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_ADVERTISED_HOST_NAME=kafkabroker
after that, i found four exception
First from historyserver( apachehudi/hudi-hadoop_2.8.4-history:latest) images:
22/03/10 10:01:38 FATAL applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2530759744317816554.8: /tmp/libleveldbjni-64-1-2530759744317816554.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)] at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182) at org.fusesource.hawtjni.runtime.Library.load(Library.java:140) at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:227) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:115) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:180) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:190)
Second : presto-coordinator-1(apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:latest) image
Presto requires amd64 or ppc64le on Linux (found aarch64)
Third : presto-worker-1(apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:latest) image
Presto requires amd64 or ppc64le on Linux (found aarch64)
Fourth spark-worker-1(apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:latest) image
22/03/10 10:12:06 WARN worker.Worker: Failed to connect to master sparkmaster:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:253)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to sparkmaster/172.18.0.10:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sparkmaster/172.18.0.10:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
As discussed with @codope , we use a different tag now for arm64 and the required dockerhub images were updated with
linux-arm64-0.10.1
tagdocker buildx build base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push docker buildx build datanode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push docker buildx build historyserver --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push docker buildx build hive_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push docker buildx build namenode --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push docker buildx build prestobase --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push docker buildx build spark_base --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkadhoc --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkmaster --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 --push docker buildx build sparkworker --platform linux/arm64 -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 --push
@Aalron Uploaded this patch to be used with 0.10.1 to get docker running on arm64 https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2
@Aalron I mentioned here already, we updated the tag for arm64 images. you should be using this tag instead of latest. I also linked a patch to illustrate the diff. Can you use the patch in your local setup? the "latest" images were reverted to previous amd64 platform which won't work for your computer.
@xushiyan According to your method, there are currently three questions left, First, about [apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1] :
22/03/11 02:43:05 INFO applicationhistoryservice.ApplicationHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT]
22/03/11 02:43:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/11 02:43:05 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
22/03/11 02:43:05 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
22/03/11 02:43:05 INFO impl.MetricsSystemImpl: ApplicationHistoryServer metrics system started
22/03/11 02:43:05 FATAL applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2100680455800525123.8: /tmp/libleveldbjni-64-1-2100680455800525123.8: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:115)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:180)
at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:190)
Second, about [apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1] :
Presto requires amd64 or ppc64le on Linux (found aarch64)
Third, about [apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1] :
Presto requires amd64 or ppc64le on Linux (found aarch64)
tracking the work and fix in https://issues.apache.org/jira/browse/HUDI-3601
going over the discussion here, it looks like the support for M1 chip is not yet available for Hudi's docker demo. I applied this patch - https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2 and ran setup_demo.sh. I got the error mentioned in this jira - https://issues.apache.org/jira/browse/HUDI-2786.
Please suggest if I am missing anything @xushiyan cc @codope
I agree with the comment above. @xushiyan and @codope this is a very needed feature.
@Mike-Roberts-2112 @Aalron understood. from last investigation, it looks like some dependent service needs to support amd64 first. We'll prioritize this accordingly.
Going to close this issue. We are tracking the support in HUDI-2786. ETA: Hudi version 0.13.0 (expected to release in early November).
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.