apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.74k stars 1.72k forks source link

[Feature][flink-shaded-hadoop3] flink-shaded-hadoop3 support #1993

Open zyd915 opened 2 years ago

zyd915 commented 2 years ago

Search before asking

Description

当我以flink on yarn模式执行官方示例: ./bin/start-seatunnel-flink.sh -m yarn-cluster -ynm seatunnel -c ./config/flink.batch.conf.template时报错: 2022-06-06 20:37:21,994 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster specification: ClusterSpecification{masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1} java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:473) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64) at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:65) at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:172) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348)

经过多方查询解决方案无果,于是自己修改官方源码对flink-shaded-hadoop的依赖版本,最终完美解决 解决方案:https://bolder-gasoline-6e0.notion.site/Seatunnel-flink-shaded-hadoop3-dc8568d60486458792881974aa93b635 重新执行官方示例: root@hadoop001:/mnt/e/work/work-tools/seatunnel-2.1.1# ./bin/start-seatunnel-flink.sh -m yarn-cluster -ynm seatunnel -c ./config/flink.batch.conf.template Export JVM_ARGS: -Dexecution.parallelism=1 Execute SeaTunnel Flink Job: ${FLINK_HOME}/bin/flink run -m yarn-cluster -ynm seatunnel -c org.apache.seatunnel.core.flink.SeatunnelFlink /mnt/e/work/work-tools/seatunnel-2.1.1/lib/seatunnel-core-flink.jar --config ./config/flink.batch.conf.template SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/e/work/work-tools/flink-1.13.6/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/mnt/e/work/work-tools/hadoop-3.2.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2022-06-07 14:59:35,541 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/mnt/e/work/work-tools/flink-1.13.6/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file. 2022-06-07 14:59:35,593 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at hadoop001/10.0.2.75:8032 2022-06-07 14:59:35,696 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2022-06-07 14:59:35,823 INFO org.apache.hadoop.conf.Configuration [] - resource-types.xml not found 2022-06-07 14:59:35,823 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils [] - Unable to find 'resource-types.xml'. 2022-06-07 14:59:35,849 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink. 2022-06-07 14:59:35,849 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink. 2022-06-07 14:59:35,850 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster specification: ClusterSpecification{masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1} 2022-06-07 14:59:41,681 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Submitting application master application_1654515017635_0008 2022-06-07 14:59:41,704 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Submitted application application_1654515017635_0008 2022-06-07 14:59:41,704 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Waiting for the cluster to be allocated 2022-06-07 14:59:41,706 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deploying cluster, current state ACCEPTED 2022-06-07 14:59:48,681 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - YARN application has been deployed successfully. 2022-06-07 14:59:48,681 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface hadoop003:61297 of application 'application_1654515017635_0008'. Job has been submitted with JobID 589078b4e5e4ec7e3e2f274736e1a232 Program execution finished Job with JobID 589078b4e5e4ec7e3e2f274736e1a232 has finished. Job Runtime: 13656 ms Accumulator Results:

+I[Ricky Huo, 37] +I[Gary, 42] +I[Gary, 87] .......

看官方源码hadoop版本2.x版,希望官方支持hadoop3.x

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

runzhi214 commented 2 years ago

我也遇到这个问题了,原来如此。 在下hadoop版本:Hadoop3.0.0 - CDH6.3.2