Open Adamyuanyuan opened 1 month ago
I tried to solve it, but should our focus be on configuring support for shared storage or letting Flink distribute that configuration?
@hailin0 @TyrantLucifer @EricJoy2048 @Hisoka-X @liugddx
Flink distribute that configuration
I perfer this one
I tried to solve it, but should our focus be on configuring support for shared storage or letting Flink distribute that configuration?
@hailin0 @TyrantLucifer @EricJoy2048 @Hisoka-X @liugddx
support for shared storage, this is also available for other engines
Flink distribute that configuration
I perfer this one
Spark has ability to distribute files. flink doesn't have that, does it?
Flink distribute that configuration
I perfer this one
Spark has ability to distribute files. flink doesn't have that, does it?
Use the yarn distribution capability -Dyarn.ship-files
@hanhanzhang Are you interested? I can assign it to you
@hanhanzhang Are you interested? I can assign it to you
We already have this problem now, assign it to me, thanks
@hanhanzhang Are you interested? I can assign it to you
We already have this problem now, assign it to me, thanks
Thanks for your contribution, we can discuss various ways here
@hanhanzhang Are you interested? I can assign it to you
We already have this problem now, assign it to me, thanks
Thanks for your contribution, we can discuss various ways here
I tried to solve this problem in yarn deployment mode, using the yarn ship file to solve the configuration file delivery, but new problems would be exposed: task plugins also need to be delivered to cluster using yarn ship archives. If there are many plugins, the uploading of plugins will take a lot of time, the task startup will be slow, and plugin discover module needs to be modified. In my opinion, application mode needs more modifications. I'm thinking about whether introducing storage modules is the best way, do you think?
I'm thinking about whether introducing storage modules is the best way
I think we should give users the choice. By default, each time the jar will be ship to cluster. If the user needs it, they can modify the address of the jar package by configuring it (like yarn.provided.lib.dirs
).
I'm thinking about whether introducing storage modules is the best way
I think we should give users the choice. By default, each time the jar will be ship to cluster. If the user needs it, they can modify the address of the jar package by configuring it (like
yarn.provided.lib.dirs
).
yes, this reduces jar upload time.
We can look at capabilities like spark --jar to see if flink can support it
We can look at capabilities like spark --jar to see if flink can support it
We already do same thing on flink in https://github.com/apache/seatunnel/blob/90cd46f50ac2f01e11ac5b7002688ea6c657cc82/seatunnel-core/seatunnel-flink-starter/seatunnel-flink-starter-common/src/main/java/org/apache/seatunnel/core/starter/flink/execution/AbstractFlinkRuntimeEnvironment.java#L235
The problem is we need upload it every times.
I tried to solve this problem in yarn deployment mode, using the yarn ship file to solve the configuration file delivery
I think we should face this problem first. Could you created a PR for this? @hanhanzhang
@hanhanzhang Can you provide the flink shell command that you are printing now?
@hanhanzhang Can you provide the flink shell command that you are printing now?
[root@localhost 2.3.7]# ${SEATUNNEL_HOME}/bin/start-seatunnel-flink-15-connector-v2.sh --config /usr/seatunnel/test.conf --deploy-mode run-application --target yarn-application --yarnqueue default --yarnjobManagerMemory 1024 --yarntaskManagerMemory 1024 Execute SeaTunnel Flink Job: ${FLINK_HOME}/bin/flink run-application --target yarn-application -D "yarn.ship-files=/usr/seatunnel/test.conf;/usr/seatunnel/2.3.7/connectors" --yarnqueue default --yarnjobManagerMemory 1024 --yarntaskManagerMemory 1024 -c org.apache.seatunnel.core.starter.flink.SeaTunnelFlink /usr/seatunnel/2.3.7/starter/seatunnel-flink-15-starter.jar --master yarn-application --config /usr/seatunnel/test.conf --name SeaTunnel
We can look at capabilities like spark --jar to see if flink can support it
We already do same thing on flink in
The problem is we need upload it every times.
yes,
I tried to solve this problem in yarn deployment mode, using the yarn ship file to solve the configuration file delivery
I think we should face this problem first. Could you created a PR for this? @hanhanzhang
I'm still trying to work it out, yarn ship file will be added to task classpath, but yarn.provided.lib.dirs should be used for flink lib. @Hisoka-X
Search before asking
What happened
When using Flink Application mode to submit Seatunnel, if the same error always occurs.
SeaTunnel Version
2.3.6
SeaTunnel Config