Closed JDarDagran closed 4 months ago
Name | Link |
---|---|
Latest commit | ce3835df09ceb5614d70620afcc045ab31f09480 |
Latest deploy log | https://app.netlify.com/sites/peppy-sprite-186812/deploys/66a17a26a65cb100080bd928 |
It is breaking backwards compatibility - however, do we need it in metrics? If so, I may suggest 2 options:
@JDarDagran I agree that more investment is needed in our metrics, and correlating an API call ->
query is nice but adding the HTTP method feels unrelated to the metric. This metric is for a DB call, not an HTTP call, though the method invocation chain is implicit. We need consistent naming strategies for not only DAO metrics, but all metrics emitted (ex: should we add the DAO method name to the API metrics? If we go down this route, metrics would contain, so it would seem, unrelated parts). I personally, I'm not a fan of the auto-generated metric names by dropwizard.
I would favor first defining a proposal for metrics and their importance, naming strategy (ie: no longer relying on dropwizard auto-generated names), labels, etc. If we were to introduce breaking changes (and I think we have to), we need a precise metric naming strategy and list of metrics that are critical for debugging DB perf issue before moving forward.
What about adding the HTTP call as a label? This will at least unblock the PR and not introduce a breaking change and new naming strategy.
Reworked it. Now v2 endpoint /metrics/v2
contains following metrics:
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.01",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.025",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 0.014148125
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.01",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375
I think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself endpoint_method
, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:
Given Example Metric:
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375
Define Metric:
# The time to make the DB call for a given HTTP endpoint.
marquez_db_duration_seconds_by_http_call
Labels:
sql_class
: the name of the DAO class (replaces object_name
)sql_method
: the name of the method called by the DAO class (replaces method_name
)http_method
: the name of the HTTP method for the given callhttp_path
: the name of the HTTP path for the given callI think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself
endpoint_method
, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:Given Example Metric:
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375
Define Metric:
# The time to make the DB call for a given HTTP endpoint. marquez_db_duration_seconds_by_http_call
Labels:
sql_class
: the name of the DAO class (replacesobject_name
)sql_method
: the name of the method called by the DAO class (replacesmethod_name
)http_method
: the name of the HTTP method for the given callhttp_path
: the name of the HTTP path for the given call
@wslulciuc I did follow your advice and renamed metric and labels as per your suggestion. If you / any commiter could run CI tests (probably with git-push-fork-to-upstream-branch
- I don't know why it doesn't trigger circleCI by default) and review, I'd really appreciate that.
Problem
Currently, in metrics endpoint there is information gathered that contains SQL Object name + method name. It might be more informative if additional information about endpoints would be added as well.
Solution
Introduce labels (DAO name, DAO method, endpoint method, endpoint path).
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)