Closed gabrywu closed 6 days ago
I think this is a good idea
This is a public repo which can achieve this function, https://github.com/gabrywu/Aop2YarnClient
it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu
it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu
Do you have any good ideas to resolve it? @xiejiajun
it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu
Do you have any good ideas to resolve it? @xiejiajun
I thought about writing the appId to a public storage such as Mysql, but it will introduce additional third-party service configuration such as JdbcUrl , so we still need to think about it carefully.
it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu
Do you have any good ideas to resolve it? @xiejiajun
I thought about writing the appId to a public storage such as Mysql, but it will introduce additional third-party service configuration such as JdbcUrl , so we still need to think about it carefully.
Yes, so the example project just put it to a local file
@caishunfeng
Describe the question For now, if we execute a yarn job in a SHELL script, we find the application IDs in the logs by regex 'application\d+\d+'. I think it's so ugly and has performance issues. So I suggest that we register an aspect when executing 'yarn jar' command, we can weave a join point to org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication, where we can get the submitted application id and the tracking URL, and output them into one local file
What are the current deficiencies and the benefits of improvement
Which version of DolphinScheduler:
Describe alternatives you've considered
add the following two env to global envs
export YARN_CLIENT_OPTS="-javaagent:/pathto/aspectjweaver-1.9.6.jar"
export YARN_USER_CLASSPATH=/pathto/Aop2YarnClient-1.0-SNAPSHOT.jar
Then when submitting applications to the yarn cluster, the aspect in Aop2YarnClient-1.0-SNAPSHOT.jar will be registered, and we can get the submitted application id and the tracking URLThis is an example, I just output the application id to console![image](https://user-images.githubusercontent.com/8545796/98122060-0c73de00-1eeb-11eb-97d4-85c38c282540.png)
Here is the sample code![image](https://user-images.githubusercontent.com/8545796/98123035-527d7180-1eec-11eb-9e25-67200d7878b1.png)
The solution is suitable for Hive, Spark, Flink, and other tools running the yarn cluster. 'hive -e 'hive sql'' test passed