appuio / appuio-cloud-reporting

Reporting for APPUiO Cloud
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Extract tenant ID from dedicated label instead of lookup key #147

Closed HappyTetrahedron closed 11 months ago

HappyTetrahedron commented 11 months ago

Note: PR is based on fix/source-key-generation to avoid conflicts, will rebase onto master once that is merged

Summary

For each query result, the corresponding tenant ID is required for the purpose of mapping to customers in the target system. Currently, that information is extracted from the source key (lookup key for products, discounts etc) - this is untidy, as in theory the lookup key could contain arbitrary values and we shouldn't rely on the tenant ID being available at a specific position therein.

To fix this, I instead extract the tenant ID from a dedicated tenant_id label in the query result. This seems to have been the plan from the start, looking at some of the existing code. A tenant label was supposed to be present in all queries, it just ended up not being used (as the source key was used instead).

I changed the label name to tenant_id instead of tenant primarily for this reason: as far as I can see, a tenant_id label with the correct value is already present in all the queries that are set up in appuio-cloud-reporting, appuio-managed-kubernetes-reporting, and appuio-managed-openshift-reporting. Likely, it was assumed that the label is required (since it originally should have been), and so it was included in each query. If I haven't overlooked anything, this change should therefore not break our existing setup, even though it is technically a breaking change.

Checklist

bastjan commented 11 months ago

For APPUiO Cloud the tenant is the organization. tenant_id is the Project Syn tenant which is not correct on shared clusters.

HappyTetrahedron commented 11 months ago

For APPUiO Cloud the tenant is the organization. tenant_id is the Project Syn tenant which is not correct on shared clusters.

I thought so too, but actually all the APPUiO Cloud queries explicitly rename the appuio_io_organization label to tenant_id. So the content of the label is correct - it has to be, since it's literally what's used in the 3rd position of the product identifier as of right now.

I'm rather sure that whoever wrote these queries was under the impression that the tenant_id label is required already, so efforts were taken to correct the label for APPUiO Cloud.

bastjan commented 11 months ago

I see. Did not remember correctly then https://github.com/appuio/appuio-cloud-reporting/blob/fd1d44ebd9311658cee9fb82b9e889838071c248/pkg/db/seeds/appuio_cloud_memory.promql#L92

Did you check appcat:billing too?

HappyTetrahedron commented 11 months ago

I checked all the queries in the production databases of appuio cloud, appuio managed kubernetes, and appuio managed openshift. The appcat queries are in the appuio cloud reporting DB, right? Or is there another DB I've missed?

bastjan commented 11 months ago

Looks like they use the source key to get the tenant... https://console.cloudscale-lpg-2.appuio.cloud/monitoring/query-browser?query0=appcat%3Abilling

This one is a recording rule evaluated on the clusters. The query in the db is just querying the recording rule.

HappyTetrahedron commented 11 months ago

Oh I see... oh, that's annoying >.<

HappyTetrahedron commented 11 months ago

Possible solutions:

what do you think?

bastjan commented 11 months ago

A relabel rule can be applied already before merging the other changes, it won't break anything to have an additional label. So i'd go for that.

HappyTetrahedron commented 11 months ago

update: tenant_id ends up correct in mimir https://insights.appuio.net/explore?orgId=1&left=%7B%22datasource%22:%22c2b8470c-3134-4e70-b6c4-a21b5e7d2d45%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22c2b8470c-3134-4e70-b6c4-a21b5e7d2d45%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22appcat:billing%5Cn%22,%22legendFormat%22:%22__auto%22,%22range%22:true,%22instant%22:true%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D

The recording rule actually already does the appropriate relabeling to update the tenant_id. But of course we also set tenant_id automatically somewhere for all our timeseries. In mimir, that winds up getting renamed to __tenant_id__ since there already is a conflicting label. Not sure why that doesn't happen when querying via openshift monitoring.

simu commented 11 months ago

Not sure why that doesn't happen when querying via openshift monitoring.

We add cluster_id, tenant_id and the corresponding display names as externalLabels in Prometheus, so they're automatically added whenever Prometheus sends anything to an external system (Alertmanager, remote write, ...), but they don't show up if you query Prometheus locally

HappyTetrahedron commented 11 months ago

Well, they aren't sent to Mimir, according to our observations.

HappyTetrahedron commented 11 months ago

After discussion in the Daily, we will not move forward with this change. I've documented the findings here: https://wiki.vshn.net/pages/viewpage.action?pageId=355369468