archivesunleashed / aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137 stars 33 forks source link

For extractor (spark-submit) job, set Spark app name to be the extractor job name. #458

Closed ruebot closed 4 years ago

ruebot commented 4 years ago

Describe the solution you'd like

Currently, the app name for spark-submit jobs is "Archives Unleashed Toolkit". We should update this to be dynamic, so we can get a better sense of the jobs running in places like the Spark History Server:

Screenshot from 2020-05-04 09-00-08

Pattern should probably be: aut + extractor name. For example, "aut - AudioInformationExtractor".