apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.89k stars 1.01k forks source link

[Bug] Flink SQL application name can't have chinese #2153

Closed ziqiang-wang closed 1 year ago

ziqiang-wang commented 1 year ago

Search before asking

What happened

image

The name of the flink SQL application circled in the above figure contains Chinese, After submitting the task to yarn, it was found that the TaskNamager in the flink cluster could not be initialized. The following error is found in the JobManager log

2022-12-13 11:51:32,623 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker container_e16_1670902385727_0004_01_000002 is terminated. Diagnostics: Container container_e16_1670902385727_0004_01_000002 was invalid. Diagnostics: [2022-12-13 11:51:32.263]File does not exist: hdfs://bigdata/user/hdfs/.flink/application_1670902385727_0004/streampark-flinkjob_??????hive-1.15.jar
java.io.FileNotFoundException: File does not exist: hdfs://bigdata/user/hdfs/.flink/application_1670902385727_0004/streampark-flinkjob_??????hive-1.15.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

It can be found that yarn ResourceManager cannot find the jar containing Chinese in HDFS because all Chinese characters are garbled.

Check the workspace of the platform project on the machine, and find the jar package in the corresponding application id directory, which contains the name of the application and also contains Chinese. Then check the HDFS path displayed in the log, and find the corresponding jar package, which also contains Chinese. And the jar package names appear normal in both places.

Suggestions: The application id is concatenated to the jar package name instead of the application name, because the id is not in Chinese. Therefore, ResourceManager cannot find the jar package in HDFS and cannot initialize TaskManager. Moreover, the jar package contains the application id, which does not increase the difficulty of debugging.

yarn environment

CDH-6.3.2 yarn-3.0.0+cdh6.3.2

StreamPark Version

2.0.0

Java Version

1.8.0_202

Flink Version

Flink-1.15.3

Scala Version of Flink

Scala_2.12

Error Exception

2022-12-13 11:51:32,623 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker container_e16_1670902385727_0004_01_000002 is terminated. Diagnostics: Container container_e16_1670902385727_0004_01_000002 was invalid. Diagnostics: [2022-12-13 11:51:32.263]File does not exist: hdfs://bigdata/user/hdfs/.flink/application_1670902385727_0004/streampark-flinkjob_??????hive-1.15.jar
java.io.FileNotFoundException: File does not exist: hdfs://bigdata/user/hdfs/.flink/application_1670902385727_0004/streampark-flinkjob_??????hive-1.15.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
    at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


### Screenshots

_No response_

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
ziqiang-wang commented 1 year ago

What's new:

This problem should be caused by inconsistent coding between the platform and the production environment running.

Fields in the new virtual table sentences in my flink sql are often described in Chinese. After the tasks are scheduled to yarn, Chinese characters corresponding to the log of JobManager on the Flink UI are garbled. The screenshot is as follows: image

In addition, all Chinese characters related to sql in the yarn task log are garbled characters. The screenshot is as follows: image

At present, I do not know how to solve this problem.

ziqiang-wang commented 1 year ago

Add flowing configuration to the conf/flink-conf.yaml file:

env.java.opts: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8