Closed Drewster727 closed 4 years ago
Are you seeing PrometheusSink cannot be instantiated
errors in the executor logs? Others in the past experienced this issue due to Yarn not deploying the spark-metrics.jar to executor nodes by the time metrics system is initialised within the executor. The solution to that was to copy upfront spark-metrics.jar and all it's dependencies to executor nodes before executors are being started: https://github.com/banzaicloud/spark-metrics/issues/30#issuecomment-492301988
@stoader I am digging in to see if this error is appearing in executor logs, have not seen it yet. I'll report back and close this out if so. Thanks.
@stoader Even when I take yarn out of the equation, and just run locally via spark-submit, I still only get driver metrics. I ran through the suggestion from @mitchelldavis here: https://github.com/banzaicloud/spark-metrics/issues/30#issuecomment-492301988 i.e. created a pom.xml and then manually specify the jars via --jars command. I see no errors and no indication of anything not loading.
spark-submit --proxy-user livy
--conf spark.metrics.namespace=drew
--conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777"
--files C:\Spark_Testing\log4j.xml
--jars C:\Spark_Testing\collector-0.12.0.jar,C:\Spark_Testing\metrics-core-3.1.2.jar,C:\Spark_Testing\simpleclient-0.3.0.jar,C:\Spark_Testing\simpleclient_dropwizard-0.3.0.jar,C:\Spark_Testing\simpleclient_pushgateway-0.3.0.jar,C:\Spark_Testing\spark-metrics_2.11-2.3-3.0.1.jar,C:\Spark_Testing\hive-jdbc-1.2.1000.2.6.5.84-2.jar,C:\Spark_Testing\hive-service-1.2.1000.2.6.5.84-2.jar,C:\Spark_Testing\spark-sql-kafka-0-10_2.11-2.3.0.2.6.5.84-2.jar,C:\Spark_Testing\spark-streaming-kafka-0-10_2.11-2.3.0.2.6.5.84-2.jar,C:\Spark_Testing\elasticsearch-spark-20_2.11-6.8.1.jar
--class com.test.Application C:\_code\scala-2.11\application.jar local
I also dug through any and all metrics on my yarn cluster and could not see any complaints of not being able to instantiate the PrometheusSink.
Any thoughts or suggestions?
can you see in your executor logs that the PrometheusSink is being instantiated?
hi @stoader sorry for the late reply -- yes, after doing some further review I do see this in the logs only for the executor:
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1887)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:235)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:228)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
... 4 more
Which is odd considering the driver has access and I'm providing the repository and jars packages:
application_spark_metrics_conf = "metrics.properties"
application_spark_metrics_namespace = "spark"
application_spark_jars_repositories = "http://repo.hortonworks.com/content/repositories/releases,https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases"
application_spark_jars_packages = "org.apache.spark:spark-streaming-kafka-0-10_2.11:2.3.2,org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.2,org.elasticsearch:elasticsearch-spark-20_2.11:6.8.1,com.banzaicloud:spark-metrics_2.11:2.3-3.0.1,io.prometheus:simpleclient:0.8.1,io.prometheus:simpleclient_dropwizard:0.8.1,io.prometheus:simpleclient_pushgateway:0.8.1,io.dropwizard.metrics:metrics-core:3.1.2"
application_files = ["/hdfs/path/log4j.xml","/hdfs/path/metrics.properties","/hdfs/path/jmxCollector.yml"]
All of those vars get sent to livy with the spark worker deployment. The files in the /hdfs/path/...
paths all exist in HDFS and are accessible.
Also -- if the answer is to just copy the jars manually... which ones? and where? can I put them into the same hdfs path that my metric.properties/jmxCollector.yml files are?
Thoughts?
Thanks, Drew
Also -- just tested dropping the following jars on each node and in HDFS:
hdfs dfs -ls /opt/prometheus/jars/
Found 6 items
-rw-r--r-- 3 hdfs hdfs 112558 2020-01-28 17:17 /opt/prometheus/jars/metrics-core-3.1.2.jar
-rw-r--r-- 3 hdfs hdfs 105245 2020-01-28 17:17 /opt/prometheus/jars/metrics-core-4.1.2.jar
-rw-r--r-- 3 hdfs hdfs 5823 2020-01-28 17:17 /opt/prometheus/jars/simpleclient_common-0.8.1.jar
-rw-r--r-- 3 hdfs hdfs 16319 2020-01-28 17:17 /opt/prometheus/jars/simpleclient_dropwizard-0.8.1.jar
-rw-r--r-- 3 hdfs hdfs 9335 2020-01-28 17:17 /opt/prometheus/jars/simpleclient_pushgateway-0.8.1.jar
-rw-r--r-- 3 hdfs hdfs 135208 2020-01-28 17:17 /opt/prometheus/jars/spark-metrics_2.11-2.3-3.0.1.jar
I told my livy/yarn job to look for these in the files
designation.
Am I missing any there?
I seem to be able to get further if I tell yarn/livy that it can look for jars via the extraClassPath /opt/prometheus/jars
Now it's throwing some snake/yaml dependency errors. Maybe I need to copy more jars out there?
At this point I'm not sure I can rely on this... which is too bad :(
Which is odd considering the driver has access and I'm providing the repository and jars packages:
The only reason I can imagine, what also was reported by others as well, that executors are slow to download the jars (compared to the driver) and jars are not there by the time executor initialises its metrics system (thus throws java.lang.ClassNotFoundException: org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
). The only solution to that is to copy all necessary jars to the node where executes will run upfront before executors are started.
You can download the spark-metrics jar and all its dependencies to a temp directory using the following steps:
mvn dependency:get -DgroupId=com.banzaicloud -DartifactId=spark-metrics_2.11 -Dversion=2.3-2.1.0
mkdir temp
mvn dependency:copy-dependencies -f ~/.m2/repository/com/banzaicloud/spark-metrics_2.11/2.3-2.1.0/spark-metrics_2.11-2.3-2.1.0.pom -DoutputDirectory=$(pwd)/temp
@stoader did finally get it to work, had to drop these jars on each node in the cluster into a specific directory, then tell livy/yarn to look there:
collector-0.12.0.jar
metrics-core-4.1.2.jar
simpleclient-0.8.1.jar
simpleclient_common-0.8.1.jar
simpleclient_dropwizard-0.8.1.jar
simpleclient_pushgateway-0.8.1.jar
snakeyaml-1.16.jar
spark-metrics_2.11-2.3-3.0.1.jar
My issue with this is that I have to manually maintain these jars on each node. Does anyone know how to get yarn to look in an hdfs path for these?
Did you try also dropping these jars to the same path where the standard spark jars (spark/jars) live? The path is on Spark's class path.
I'm not sure if there is a way to tell Yarn to download jars required by spark executors from HDFS.
@stoader I did not try dropping the jars into the same path where the standard spark jars live. However, that would still require me to drop the jars on every node in the cluster, so I'm not sure I would gain anything there. Bummer on the HDFS part... that would be super handy, but I understand that's a yarn/spark issue.
I'm closing this for now. Got it working by dropping jars on the nodes as outlined.
If you are running in cluster mode you will get executors metrics from other nodes where executor might be running ... worked for me
Describe the bug Not seeing executor metrics (only driver).
Steps to reproduce the issue: Spark 2.3.0 / Hadoop 2.7
metrics.properties:
jmxCollector.yml
I see metrics flowing in appropriately to the push gateway, but it is only driver metrics, no executor...
I found a post from someone who has a similar set of spark/hadoop versions on a cloudera forum, but no answers. https://community.cloudera.com/t5/Support-Questions/Spark-metrics-sink-doesn-t-expose-executor-s-metrics/td-p/281915
I see the same problem with the graphite metric sink built into spark. It occurs whether we're in yarn cluster mode or local spark-submit mode.
Can anyone explain what I'm doing wrong here?
Thanks, Drew