Add endpoint method and path to metrics name.

MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata

https://marquezproject.ai

Apache License 2.0

1.78k stars 320 forks source link

Add endpoint method and path to metrics name. #2850

Closed JDarDagran closed 4 months ago

JDarDagran commented 4 months ago

Problem

Currently, in metrics endpoint there is information gathered that contains SQL Object name + method name. It might be more informative if additional information about endpoints would be added as well.

Solution

Introduce labels (DAO name, DAO method, endpoint method, endpoint path).

Checklist

[x] You've signed-off your work
[x] Your changes are accompanied by tests (if relevant)
[x] Your change contains a small diff and is self-contained
[ ] You've updated any relevant documentation (if relevant)
[ ] You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
[ ] You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
[ ] You've included a header in any source code files (if relevant)

netlify[bot] commented 4 months ago

Deploy Preview for peppy-sprite-186812 canceled.

Name	Link
Latest commit	ce3835df09ceb5614d70620afcc045ab31f09480
Latest deploy log	https://app.netlify.com/sites/peppy-sprite-186812/deploys/66a17a26a65cb100080bd928

JDarDagran commented 4 months ago

It is breaking backwards compatibility - however, do we need it in metrics? If so, I may suggest 2 options:

doubling old and new names
making new names optional via config

wslulciuc commented 4 months ago

@JDarDagran I agree that more investment is needed in our metrics, and correlating an API call -> query is nice but adding the HTTP method feels unrelated to the metric. This metric is for a DB call, not an HTTP call, though the method invocation chain is implicit. We need consistent naming strategies for not only DAO metrics, but all metrics emitted (ex: should we add the DAO method name to the API metrics? If we go down this route, metrics would contain, so it would seem, unrelated parts). I personally, I'm not a fan of the auto-generated metric names by dropwizard.

I would favor first defining a proposal for metrics and their importance, naming strategy (ie: no longer relying on dropwizard auto-generated names), labels, etc. If we were to introduce breaking changes (and I think we have to), we need a precise metric naming strategy and list of metrics that are critical for debugging DB perf issue before moving forward.

What about adding the HTTP call as a label? This will at least unblock the PR and not introduce a breaking change and new naming strategy.

JDarDagran commented 4 months ago

Reworked it. Now v2 endpoint /metrics/v2 contains following metrics:

marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.01",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.025",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 0.014148125
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.01",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375

wslulciuc commented 4 months ago

I think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself endpoint_method, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:

Given Example Metric:

marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375

Define Metric:

# The time to make the DB call for a given HTTP endpoint.
marquez_db_duration_seconds_by_http_call

Labels:

sql_class: the name of the DAO class (replaces object_name)
sql_method: the name of the method called by the DAO class (replaces method_name)
http_method: the name of the HTTP method for the given call
http_path: the name of the HTTP path for the given call

JDarDagran commented 4 months ago

I think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself endpoint_method, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:

Given Example Metric:
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375
Define Metric:
# The time to make the DB call for a given HTTP endpoint.
marquez_db_duration_seconds_by_http_call
Labels:

sql_class: the name of the DAO class (replaces object_name)

sql_method: the name of the method called by the DAO class (replaces method_name)

http_method: the name of the HTTP method for the given call

http_path: the name of the HTTP path for the given call

@wslulciuc I did follow your advice and renamed metric and labels as per your suggestion. If you / any commiter could run CI tests (probably with git-push-fork-to-upstream-branch - I don't know why it doesn't trigger circleCI by default) and review, I'd really appreciate that.

MarquezProject / marquez

Add endpoint method and path to metrics name. #2850

Problem

Solution

Checklist

✅ Deploy Preview for peppy-sprite-186812 canceled.

Deploy Preview for peppy-sprite-186812 canceled.