banzaicloud / spark-metrics

Spark metrics related custom classes and sinks (e.g. Prometheus)
Apache License 2.0
176 stars 66 forks source link

Metrics name pre-processing by custom Prometheus sink is working for only one component(driver/executor/applicationMaster) #64

Closed DruvaRam closed 3 years ago

DruvaRam commented 4 years ago

Hello

As we already know that Spark doesn't support Promotheus sink by default in versions < 3.0, I wanted to integrate this custom sink with our AWS EMR clusters. I was successful in publishing metrics to Prometheus PushGateway. I am having issues with metric name pre-processing. I followed this link to setup the metric name pre-processing. I am able to do metric name pre-processing only for driver or executor or applicationMaster at once, but, not on all three of them.

This is the regex suggested in the above link and it name pre-processing works only for driver. I tried replacing it with executor, applicationMaster and it works as expected for that respective component

*.sink.prometheus.metrics-name-capture-regex=(.+driver_)(.+) *.sink.prometheus.metrics-name-replacement=$2

Below are few regex I tried for metric name pre-processing, but I still see the metric name processed only for one of the component instead of all the components(exector/driver/applicationMaster).

(.*driver|.*executor|.*application)[0123456789_]+(.+) (\w+)((_\d+)_)(\w*(driver|executor|master|application)\w*_)? (\w+)((_\d+)_)(\w*(driver|executor|applicationMaster)\w*_)? ([a-zA-Z]+)((_[0-9]+)+_)(\w*(executor|master|application)\w*_)? .*(driver|executor|application)[0123456789_]+(.+)

For all the various regex, I used the name replacement to be $2 like in the above mentioned link

*.sink.prometheus.metrics-name-replacement=$2

Versions of the JARs used -

  - spark-metrics_2.12-2.4-1.0.5.jar
  - metrics-core-3.1.2.jar
  - simpleclient_dropwizard-0.3.0.jar
  - simpleclient_pushgateway-0.3.0.jar
  - simpleclient-0.3.0.jar
  - simpleclient_common-0.3.0.jar

Any help with the metric name pre-processing is greatly helpful and much appreciated.

Thanks Druva

stoader commented 4 years ago

Can you show examples of the original driver and executor metrics without name pre-processing enabled that are provided by Spark in your environment?

low-on-mana commented 3 years ago

@stoader Facing this same issue, I am running latest master on spark3 emr and using

*.sink.prometheus.metrics-name-capture-regex=(.+driver_)(.+)
*.sink.prometheus.metrics-name-replacement=$2
*.sink.prometheus.group-key=ns=stream-ns

and following in spark defaults.

spark.metrics.namespace          ${spark.app.name}

Is there any way to get rid application name & id from all metric names. They are in labels ( via job, number, role ) already.

Screenshot of application master group

Screenshot 2021-03-17 at 10 02 41 PM

Screenshot of driver group ( here the replacement has worked )

Screenshot 2021-03-17 at 10 02 28 PM

Screenshot of executor 2 ( Here instead of applicationId, job is coming in metric name) <img width="1245" alt="Screenshot 2021-03-17 at 10 02 20 PM" src="https://user-images.githubusercontent.com/8509512/111503635-eb5eac00-876c-11eb-9c43-aa76ac0b91c2.png">

Screenshot of executor 1

Screenshot 2021-03-17 at 10 02 12 PM
low-on-mana commented 3 years ago
Screenshot 2021-03-18 at 12 02 29 PM

If I dont provide any namespace, then executor metrics come with applicationId instead of name which is easier to replace with regex. But this create another issue that now we have group based on applicationId instead.

stoader commented 3 years ago

@t0il3ts0ap your config is incorrect:

*.sink.prometheus.metrics-name-capture-regex=(.+driver_)(.+)

This regex will replace only metric names that includes driver in the name.

In order to have the applicationMaster replaced you need something like:

*.sink.prometheus.metrics-name-capture-regex=(.+applicationMaster_)(.+)
low-on-mana commented 3 years ago

@stoader As you can see executor metrics are starting with appName delta_streamer_request_details_1.
Do you know how I can replace those as well ?

It will be difficult to provide dynamic metrics.properties based on application name.

stoader commented 3 years ago

I'd suggest to check the executor metric names pushed by the executor first without any regex and than create the appropriate regex for extracting only the needed part from the name

stoader commented 3 years ago

I'm closing this issue as it's not a spark-metrics issue but rather using the appropriate regex in the configuration.

low-on-mana commented 3 years ago

@stoader I used new regex, but this issue is right. It only replaces driver metrics it doesn't considers others.

*.sink.prometheus.metrics-name-capture-regex=(.*(mgck_|application_))([0-9_]*)(.+)
*.sink.prometheus.metrics-name-replacement=$4
Screenshot 2021-03-18 at 2 54 06 PM
beryllw commented 3 years ago

@stoader I used new regex, but this issue is right. It only replaces driver metrics it doesn't considers others.

*.sink.prometheus.metrics-name-capture-regex=(.*(mgck_|application_))([0-9_]*)(.+)
*.sink.prometheus.metrics-name-replacement=$4
Screenshot 2021-03-18 at 2 54 06 PM

yeah,I think you are right.It's so inconvenient

low-on-mana commented 3 years ago

@Kwafoor It doesn't works for executor if you get the jar from s3. Instead first store the jar in local filesystem ( ex-> /usr/lib/spark) in each node of your cluster ( In emr, can do via bootstrap action ), then it will work for both driver/executor.