Open m-parrella opened 3 weeks ago
To provide more context, after deploying the dashboard and configuring the exporter with the parameter AGGREGATION=namespace, we observed that the dashboard displays a summary of the all clusters.
Upon further analysis, it appears that the kubecost-s3-exporter does not apply any filters when querying the Kubecost API on the primary cluster. As a result, in a multi-cluster architecture, it retrieves data from all clusters, not just the intended namespace.
Could you confirm if this deployment approach is compatible with Kubecost version 2.X?
Thank you in advance!
Regarding the error you're seeing in the cost-analyzer-fronend
container logs:
I see you're using Kubecost version 2.3.5-rc.10, which seems like a release candidate of v2.3.5 (because of the rc
). Checking the Kubecost releases log, it looks like a similar issue was fixed in v2.3.5:
Fix error visible in aggregator logs “append row Failure: acquiring max concurrency semaphore: context canceled” resulting in hung api responses.
Checking also Kubecost's Slack workspace, I see this thread where a user reported the same error as in the Kubecost release page, then when they upgraded to v.2.3.5, they saw the same error as you're seeing (the thread doesn't conclude with a solution to the last error).
I'm using Kubecost v2.4.1 in my setup, and I don't see this error. Can you please upgrade your Kubecost setup to v.2.4.1?
Upon reviewing the Kubecost API documentation, it appears that the aggregate=container parameter is missing from the query. When we manually included this parameter, the API returned 200 OK:
The aggregate
query input in the Kubecost Allocation API, isn't mandatory. The reason the kubecost-s3-exporter
doesn't use it by default, is because when you don't specify it, Kubecost Allocation API returns the result in container aggregation, and I wanted the dashboard users to have lowest granularity level possible (they can aggregate on higher levels if needed, on the dashboard itself).
When not using the aggregate
query input, each item in the list is a container, described in the following pattern:
cluster-one/ip-A-B-C-D.ec2.internal/kubecost-eks2/kubecost-eks2-cost-analyzer-xxxxxxxxx-xxxxx/cost-analyzer-frontend
The above pattern keeps each item unique, as it includes the cluster name (as the user defined in Kubecost), node name, namespace, pod name and container name.
If I'd use the aggregate
query input with container
value, it would also return container-level items, but each item on the list will be just the container name.
Then, in cases of multiple pods with the same container name (on the same timeframe), they would aggregate into a single item in the API response, which isn't desirable (these are different containers, I'd like to differentiate between them).
The reason I still provide AGGREGATION
input in the Helm chart (which, as you can see, is used to populate the aggregate
query input in the exporter code), is in cases where container-level aggregation is too low-level in really large setups, such that the Kubecost Allocation API fails to return results. The users can then supply a higher level aggregation level in AGGREGATION
input, with the tradeoff of having less granular data.
Upon further analysis, it appears that the kubecost-s3-exporter does not apply any filters when querying the Kubecost API on the primary cluster. As a result, in a multi-cluster architecture, it retrieves data from all clusters, not just the intended namespace.
The kubecost-s3-exporter
isn't meant to apply any filters. It runs daily and collects all data in a specific 24-hour timeframe (specifically, 72 hours ago 00:00:00 UTC to 48 hours ago 00:00:00 UTC).
If you're in a single-cluster setup, then it'll collect all data for that cluster (in the above-mentioned timeframe). If you're in a multi-cluster setup and the kubecost-s3-exporter
collects from the primary cluster, then it'll collect all data for all clusters in the above-mentioned timeframe.
If by using the AGGREGATION
input as namespace
you expected it to collect from a specific namespace, then it won't. This input is meant to define the aggregation level, not a filter.
Could you confirm if this deployment approach is compatible with Kubecost version 2.X?
The kubecost-s3-exporter
supports Kubecost 2.x, in any form (single-cluster or multi-cluster Kubecost deployment). In multi-cluster deployment there are a few limitations though, which are documented in the beginning of the README file (see "Kubecost enterprise tier, with the following limitations".
So, to summarize, please upgrade to Kubecost v2.4.1 and check the issue is fixed (after you revert the code back to how it was originally)
Hi Udi,
Thanks for the clarifications, I suspect the "error querying DB" message is due to the large number of containers. Since the kubecost-s3-exporter queries data from all clusters at once, the Kubecost API struggles to handle the response. This is why the issue was resolved when using AGGREGATION=namespace instead of aggregating by containers.
We are using the Kubecost EKS-optimized bundle in a multi-cluster, multi-account architecture. Based on your previous comment, since kubecost-s3-exporter queries the Kubecost API without filters, data from all clusters is combined into a single file per day in S3. Could you confirm whether the CronJob with the Kubecost S3 Exporter image should be deployed on only one EKS cluster rather than on each cluster?
Finally, assuming the setup is working as expected (aggregating by namespace and only one S3 exporter) when I query the data with Athena, I realized the columns properties.* are empty which is not allowing us to filter by cluster or account on the Dashboard, is this related to the aggregation level?
Thank you!
We are using the Kubecost EKS-optimized bundle in a multi-cluster, multi-account architecture. Based on your previous comment, since kubecost-s3-exporter queries the Kubecost API without filters, data from all clusters is combined into a single file per day in S3. Could you confirm whether the CronJob with the Kubecost S3 Exporter image should be deployed on only one EKS cluster rather than on each cluster?
The kubecost-s3-exporter
can be deployed also on primary cluster in mutli-cluster Kubecost deployment. I worked with a user who deploy it this way, but of course it's a matter of Kubecost API server performance.
It could be that with the other user I worked with, the amount of data that was returned by the primary cluster for all clusters was still small enough for Kubecost API to properly respond, but in your case, it's too large.
Now, in Kubecost multi-cluster deployment, each individual cluster also has Kubecost deployed (the cost-model
container still needs to prepare the data), but you just query/use the UI of the primary cluster to see aggregated multi-cluster data in one place.
If each individual cluster still has the cost-analzyer-frontend
container, then I think the best approach would be to deploy the exporter CronJob
in each cluster, like you would do with the Kubecost free tier (although it would mean more deployment effort). To clarify, I don't mean that you change your deployment to multiple single-cluster deployments. I'd just like to check if each individual cluster in your multi-cluster deployment - which must still have Kubecost deployed for data preparation - also has the cost-analyzer-frontend
container.
If you see the cost-analzyer-frontend
container doesn't exist in each individual cluster, then I recall there might be another HTTP service with TCP port 9003 (though I don't know if it returns the same data).
First, please check if the cost-analzyer-frontend
is present in each individual cluster. If not, check if another service port is used to respond to API calls.
You can use the following API call with cURL to check the API (after you port-forward or use ingress to expose):
curl -k -v "http://localhost:9090/model/allocation?window=2024-10-29T00:00:00Z,2024-10-30T00:00:00Z&accumulate=false&step=1d&idle=true&splitIdle=true&idleByNode=true&shareTenancyCosts=true"
Change the protocol (http/https), hostname (localhost), TCP port (9090) and dates as needed.
Finally, assuming the setup is working as expected (aggregating by namespace and only one S3 exporter) when I query the data with Athena, I realized the columns properties.* are empty which is not allowing us to filter by cluster or account on the Dashboard, is this related to the aggregation level?
The empty properties.cluster
seems to be because when querying Kubecost Allocation API with aggregate=namespace
, it doesn't return this key in the response (for each namespace).
I tried it now in my setup (v2.4.1), here's a sample snippet from the output when not using the aggregate
query key:
"properties":
{
"cluster": "cluster-one",
"node": "ip-A-B-C-D.ec2.internal",
"container": "cost-analyzer-frontend",
"namespace": "kubecost-eks2",
"pod": "kubecost-eks2-cost-analyzer-xxxxxxx-xxxx",
"labels":
{
"app": "cost-analyzer",
"app_kubernetes_io_instance": "kubecost-eks2",
You can see that under the properties
key, there's cluster
key (and several others).
Here's a sample snippet from the output when using aggregate=namespace
:
"kubecost-eks2":
{
"name": "kubecost-eks2",
"properties":
{
"namespace": "kubecost-eks2",
"labels":
{
"eks_amazonaws_com_capacityType": "ON_DEMAND",
"eks_amazonaws_com_nodegroup_image": "ami-xxxxxxxxxxxxx",
You can see that the only key under properties
is namespace
(there are others, it's a partial output - but cluster
isn't there). While I understand why container
and pod
keys won't be there under properties
(they're in lower level than namespace), I expect node
and cluster
keys to still be there, as they're higher level than namespace.
I'm not sure why Kubecost Allocation API doesn't return these properties. I logged a ticket with their support (you need to join the Kubecost Slack workspace to see it).
As for the properties.eksclustername
empty column, that's a column that the exporter creates, it's not derived from Kubecost Allocation API. You can see the code here.
The value of this key is derived from the Helm CLUSTER_ID
value, which is expected to take EKS cluster ARN (and I use input validation to make sure it's indeed used as an input).
The reason I require it as an input is because in Kubecost free tier deployments, the default cluster name in Kubecost is cluster-one
, which is what will appear in properties.cluster
key in the Kubecost Allocation API response.
Now, I can't enforce the user to change it, and while in multiple single-cluster deployment, repeatable cluster name in multiple single-cluster Kubecost deployments isn't an issue in Kubecost UI/API itself - this would cause an issue in the dashboard, which aggregates data from mulitple clusters.
That's why I'm requiring the CLUSTER_ID
input (and validate its input), so that in multiple single-cluster setups, users will be forced to use the unique cluster identifier (the cluster ARN), and then the exporter extracts the cluster name for it and use it in the eksclustername
, so then in the dashboard there can be a unique name for each cluster (even if the cluster name is the same in each single-cluster Kubecost deployment).
Now, if you haven't changed any of the code related to this input, then this should have worked and the eksclustername
column should have shown the primary cluster name. Did you make any changes?
Regardless, I think let's leave this column aside because in multi-cluster setup it won't be useful anyway, since the exporter collects data from the primary cluster for all clusters, so all line items in Athena will have the same eksclustername
value. The solution is to customize the dashboard so it'll source the cluster name from properties.cluster
field instead (and of course, first Kubecost needs to fix the API issue I mentioned above).
But let's hold on with that. First, let me know what's your conclusion after you checked if cost-analyzer-frontend
container is present in each individual cluster in your multi-cluster Kubecost setup. This will solve both the Kubecost API performance issue (probably, unless some of your clusters are really large) and the same-eksclustername
-value issue in multi-cluster setups
Thanks for the detailed explanation.
We are using a Federated ETL deployment with one Primary cluster and five Secondary clusters. On this setup, the cost-analyzer-frontend container is only deployed on the Primary. This is limited by the official Helm Chart here where it validates if the node is not defined as an agent (agentOnly: true)
Although the frontend is only deployed on the primary, I was able to query directly the cost-model container on the secondary nodes with a small tweak. According to Kubecost documentation, in order to querying the cost-model container directly, the /model part of the URI should be removed. So I adjusted the main.py like this:
Original snippet:
# Executing the API call
logger.info(f"Querying Kubecost Allocation API for data between {start_h} and {end_h} "
f"in {granularity.lower()} granularity...")
r = requests.get(f"{kubecost_api_endpoint}/model/allocation", params=params,
timeout=(connection_timeout, read_timeout), verify=tls_verify)
Modified snippet:
# Executing the API call
logger.info(f"Querying Kubecost Allocation API for data between {start_h} and {end_h} "
f"in {granularity.lower()} granularity...")
r = requests.get(f"{kubecost_api_endpoint}", params=params,
timeout=(connection_timeout, read_timeout), verify=tls_verify)
After applying, building and pushing the new image, I deployed one kubecost-s3-exporter on each cluster and adjusted the values.yaml like this (I deployed the exporter on the same namespace as kubecost)
- name: "KUBECOST_API_ENDPOINT"
value: "http://kubecost-cost-analyzer:9003/allocation"
With this adjustment, each exporter was able to query the model locally aggregating by containers skipping kubecost repose size limitation.
After running the Crawler and the Refresh on the dataset, the data was correctly displayed on the Dashboard; the only columns ampty in Athena are:
properties.node
properties.node_capacity_type
properties.node_architecture
properties.node_os
properties.node_nodegroup_image
properties.providerid
Hope this helps!
Awesome, thanks!
I may include an input in the Helm chart, in one of the next updates, for users to choose if they want to query cost-model
or cost-analyzer-frontend
(and as a result, the exporter will use the relevant API endpoint).
As for the empty columns:
The missing properties.node
and properties.providerid
fields seem like a regression from Kubecost 2.x.
I checked Kubecost 1.108.1 (latest 1.x version), those properties were present. I then checked Kubecost 2.0.1 (first 2.x version), those properties (and others too) are missing. Finally, I checked with one of the most recent 2.x versions (2.4.1), and same properties are still missing. I tested all of the above with the same API call (same one that is executed by the exporter).
So, it looks like a regression that started in Kubecost 2.0.1. I created a ticket, you can follow (you need to join the Kubecost Slack workspace).
The missing properties.node_capacity_type
, properties.node_architecture
, properties.node_os
and properties.node_nodegroup_image
fields are derived from node labels, and you need to explicitly set Kubecost to report them in the Allocation API response. Please follow the instructions here
As a follow-up, after deploying the dashboard, we identified some discrepancies between the costs reported on the Kubecost frontend and in QuickSight.
Currently, pointing the cost model locally bypasses the reconciliation process that occurs on the primary cluster. In our setup, we use Spot Instances and Savings Plans, but this information is not reflected in the dashboard because all data is sourced from the secondary cluster.
For example:
# cat primary.json | jq .data[0] | grep -A73 kubecost-cost-analyzer-654f9d5fd-q2zlh | grep totalCost
"totalCost": 0.16583,
# cat secondary.json | jq .data[0] | grep -A73 kubecost-cost-analyzer-654f9d5fd-q2zlh | grep totalCost
"totalCost": 0.29515,
An alternative to sourcing the cost model locally (to pull data from each cluster) would be to point to the primary cluster while using the filterClusters parameter in the Kubecost API requests and adding an additional parameter to the Helm chart.
I’ll test this approach next week and keep you updated!
Thanks for the update! If you'll be using filterClusters
and adding parameter to the Helm chart, I take it that you're going to deploy a separate instance of the exporter on the primary cluster, each one with a different cluster in the fliterCluster
input.
That's a good idea. I have two other suggestions:
cost-model
will report cost after reconciliation. I don't know if it's possible though, I haven't tried itfilterClusters
, you could run the Allocation API first with aggregate=cluster
, to get the list of clusters (assuming it'll return properties.cluster
in the response).
Then, you can iterate over each cluster returned from the aggregate=cluster
call, and for each iteration, run the Allocation API (the one the exporter now runs, without aggregate
), and apply the cluster in the filterClusters
.
This way, you don't need to expose cluster filter in the Helm chart, and you can run only a single instance of the exporter.
This approach also gives you the option to parallelize the calls, if Kubceost API can take it, and if not, just run them one after another - but in both cases, all this is done in one exporter instance, without having to expose cluster filter in the Helm chart.
I planned introducing such a logic at some point, to deal with issues like you faced, as another user reported similar issue before where Kubecost API didn't respond when it had to return large amounts of data
Hi,
During our initial setup on an EKS 1.29 cluster with a multi-cluster federated ETL architecture with Kubecost 2.3.5-rc.10, the pod running kubecost-s3-exporter exited with the following error:
Upon checking the kubecost-frontend logs, we observed that the API returned a 499 error:
We also tried querying the API manually using Postman and received the following response:
Upon reviewing the Kubecost API documentation, it appears that the aggregate=container parameter is missing from the query. When we manually included this parameter, the API returned 200 OK:
Here’s a relevant code snippet showing how the API call parameters are currently defined:
Could you confirm if the aggregate parameter needs to be explicitly defined for the query to succeed? The exporter works fine if we define an AGGREGATION=namespace.
Thanks in advance!