bf2fc6cc711aee1a0c2a / ffm-project

Repository containing issues and roadmap for the factorized fleet manager
Apache License 2.0
0 stars 2 forks source link

User Facing Metrics Service #15

Open davidffrench opened 2 years ago

davidffrench commented 2 years ago

Context

Red Hats managed services should have a consistent user experience for exposing customers service instance metrics. This should be provided in both a JSON and Prometheus text format. Creating a new user-facing metrics service, exposing a subset of managed service instance metrics to the customers will allow the customers to monitor their own managed service instances. It will also allow UIs to display their instance metrics in user-friendly dashboards.

User Narrative

Peter is the team lead for Innovation Inc, which has purchased several Red Hat managed services including Red Hat OpenShift Streams for Apache Kafka (RHOSAK). Peter is using RHOSAK as part of their own internal system and would like to monitor the Kafka instance and include the metrics in their own internal system dashboards. Thankfully, Peter can easily do this by scraping their Kafka instance metrics from the Red Hat user-facing metrics API directly into their companies Prometheus instance.

Steve from Innovation Inc is also interested in viewing his RHOSAK instance metrics, he is happy to view these from his Kafka instance page available on console.redhat.com where he can see several widgets displaying his instances metrics. Curious about where these are coming from, he can see they are retrieved from the /query and query_range endpoints from the Red Hat user-facing metrics API.

Job Stories

Analysis

(links to analysis docs containing architecture design work, requirements gathering, etc)

Task List

miguelsorianod commented 2 years ago

Isn't this potentially covered by Observatorium??

davidffrench commented 2 years ago

Unfortunately, Observatorium does not currently authorise external users to their API. The idea is hopefully to have a conversation with the RHOBS team to extend Observatorium with this functionality.

miguelsorianod commented 2 years ago

Unfortunately, Observatorium does not currently authorise external users to their API. The idea is hopefully to have a conversation with the RHOBS team to build out this service.

I am not sure I understood this. Could you give a little bit more detail about it? At the moment we are able to write and read metrics to Observatorium. Additionally, I understand you are talking about the Red Hat hosted Observatorium instance. Observatorium is an open-source project and its api mentions writing metrics(https://github.com/observatorium/api). I think it is important we differentiate the open-source software vs limitations that are imposed on Red Hat-hosted versions of the software

davidffrench commented 2 years ago

I think it is important we differentiate the open-source software vs limitations that are imposed on Red Hat-hosted versions of the software

Yes, you are correct. This GitHub issue might be better off as an internal issue. @emmanuelbernard Do you have any suggestions on how to track these types of issues?

emmanuelbernard commented 2 years ago

The way to track it is to think slightly generalized so that it addressed the need a user (Red Hat here) has. A service will store lots of metrics and our de facto preference is the observatorium stack today (but any prometheus would do). As a service you want to expose a user centric subset of these metrics and make sure they are exposed via an API with the proper authz. How do you do it assuming your metrics system does only expose metrics internally?

emmanuelbernard commented 2 years ago

One thing that is striking me is that this job story only refers to documentation. Wouldn't it be some endpoint / proxying / authz pre flight checking that a fleet manager implementor would rather have done for him or her?

davidffrench commented 2 years ago

That is a fair assessment @emmanuelbernard .

I raised this functionality with the Observatorium team and shared this GitHub issue. The action is for the team to create a JIRA specifically for this which should elaborate on the job stories. Once that has been created, we should be able to close this specific issue.