Closed matyix closed 6 years ago
Unit test build for this PR seems stuck for almost 3 days.
@ssuchter @kimoonkim, should the unit test build timeout eventually?
Some test failed due to OOM. Can someone who has admin access to the Jenkins instance kill the build?
@foxish I think we can time out the Jenkins jobs, say, after 8 hours. I'll see if i can make this change to Jenkins jobs.
@liyinan926 Yes, let me find and kill the hanging build.
I just checked. The build was killed already by Jenkins. The build log console says:
...
Build was aborted
Aborted by Jenkins Admin
Hmm, also the unit test had set up timeout already. The config page says:
Abort the build if it's stuck
Time-out strategy Absolute
Timeout minutes: 60
I guess it doesn't always work :-(
@liyinan926 @kimoonkim @foxish Sorry forgot to update, but I killed the test earlier and added a 60 minute timeout to the configuration around noon today.
rerun all tests please
I'm a little confused as to what this property actually does. I see the configuration of spark.metrics.namespace
being set, but I don't see it being used in this pull request itself?
Hello @mccheah - thanks for the feedback and review. The starting base for the PR was an earlier version and the custom metrics namespace in executors was missing back then. I have reverted the changes and updated the PR text as well reflecting the latest commit.
Is the idea here to backport the spark support for custom metric namespaces to this fork? I ask because I'd expect this to be resolved either via the upstreaming process or via our next rebase.
@erikerlandson After rebasing with latest branch-2.2-kubernetes the original PR has reduced the scope to support custom metrics namespace for external shuffle service
only (missing from master).
@matyix Oh I see - I was referring to the next time we rebase against an upstream release (spark-2.2.1 or spark-2.3, etc)
@erikerlandson shall I push this PR upstream as well?
Closing this as it's redundant, this PR https://github.com/apache/spark/pull/19775 fixes this one as well.
What changes were proposed in this pull request?
Support MetricNameSpace in
shuffle
to shuffleservice metrics.Pass
spark.metrics.namespace
configuration to shuffle such as the shuffleservice can publish metrics with custom namespace. In case of shuffleservice theshuffle
prefix is used in the metrics.How was this patch tested?
Executed all unit and integration tests.
Manual testing through deploying a Spark cluster, Prometheus server, Pushgateway and ran SparkPi - and checking for the shuffle metrics in Pushgateway/Grafana.