Open tatiana opened 1 month ago
Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!
As of Marquez 0.47.0, the /metrics
endpoint already exposes the following information:
marquez/db/JobDao.java:findAll
is named marquez_db_JobDao_findAll
)Example of the information made available in this endpoint:
Confirm if we need further details.
At the moment, we can see the data of interest using the Java method, but we desire this feature to allow us to see it from an HTTP endpoint perspective as well (e.g., POST api/v1/lineage
).
@mobuchowski 's suggest: add the HTTP verb and the endpoint path
E.g. marquez_api_post_v1Lineage_db_JobDao
We need to investigate if and how this could be accomplished, and if there are better ways
Context
Since the 0.7.0 release (#1906), Marquez supports pushing metrics to Prometheus.
This task proposes extending the current capability to give visibility to Marquez's SQL queries. Some of the questions we'd like to be answered:
By identifying potential bottlenecks in Marquez queries and the database, this extension could facilitate the provisioning of adequate resources. This, in turn, could lead to improved performance and efficiency of the database and Marquez itself.
Implementation
If possible, we could give visibility of frequency (count) and duration (gauge) for all queries Marquez runs. There is a possibility this could be done close to
jdbi
: https://metrics.dropwizard.io/4.2.0/manual/jdbi.htmlIf this is not possible, we could add the instrumentation to specific write and read endpoints, covering at least the SQL queries triggered by the following endpoints:
api/v1/lineage
(*)api/v1/namespaces/{namespace}
(*)api/v1/namespaces
api/v1/namespaces/{namespace}/jobs/{job}
api/v1/namespaces/{namespace}/datasets
api/v1/column-lineage
The most critical are (*)