awslabs / containers-cost-allocation-dashboard

A QuickSight dashboard for containers cost allocation based on data from Kubecost
Apache License 2.0
28 stars 0 forks source link

Allocation API #4

Open m-parrella opened 3 weeks ago

m-parrella commented 3 weeks ago

Hi,

During our initial setup on an EKS 1.29 cluster with a multi-cluster federated ETL architecture with Kubecost 2.3.5-rc.10, the pod running kubecost-s3-exporter exited with the following error:

2024-10-30 12:17:21,071 INFO kubecost-s3-exporter: Querying Kubecost Allocation API for data between 2024-10-24 00:00:00 and 2024-10-25 00:00:00 in daily granularity...
2024-10-30 12:18:22,122 ERROR kubecost-s3-exporter: Original error: 'Expecting value: line 1 column 1 (char 0)'. Check if you're using incorrect protocol in the URL (for example, you're using 'http://..' when the API server is using HTTPS)

Upon checking the kubecost-frontend logs, we observed that the API returned a 499 error:

XXX.YYY.94.219 - - [30/Oct/2024:12:18:21 +0000] "GET /model/allocation?window=2024-10-24T00%3A00%3A00Z%2C2024-10-25T00%3A00%3A00Z&accumulate=False&step=1d&idle=True&splitIdle=True&idleByNode=True&shareTenancyCosts=True HTTP/1.1" 499 0 "-" "python-requests/2.31.0" "XXX.YYY.17.178, 127.0.0.6"
XXX.YYY..94.219 - - [30/Oct/2024:12:18:22 +0000] "GET /model/allocation?window=2024-10-24T00%3A00%3A00Z%2C2024-10-25T00%3A00%3A00Z&accumulate=False&step=1d&idle=True&splitIdle=True&idleByNode=True&shareTenancyCosts=True HTTP/1.1" 499 0 "-" "python-requests/2.31.0" "XXX.YYY.17.178, 127.0.0.6

We also tried querying the API manually using Postman and received the following response:

{
    "code": 500,
    "data": null,
    "message": "error querying DB: error querying via querysvc append row Failure: error querying label values: context canceled"
}

Upon reviewing the Kubecost API documentation, it appears that the aggregate=container parameter is missing from the query. When we manually included this parameter, the API returned 200 OK:

XXX.YYY.94.219 - - [30/Oct/2024:13:23:23 +0000] "GET /model/allocation?window=2024-10-24T00%3A00%3A00Z%2C2024-10-25T00%3A00%3A00Z&accumulate=False&step=1d&idle=True&splitIdle=True&idleByNode=True&shareTenancyCosts=True&aggregate=container HTTP/1.1" 200 138231 "-" "PostmanRuntime/7.37.3" "10.20.6.236, 127.0.0.6"

Here’s a relevant code snippet showing how the API call parameters are currently defined:

                # Calculating the window and defining the API call requests parameters
                window = f'{start_h.strftime("%Y-%m-%dT%H:%M:%SZ")},{end_h.strftime("%Y-%m-%dT%H:%M:%SZ")}'
                if aggregate == "container":
                    params = {"window": window, "accumulate": accumulate, "step": step, "idle": idle,
                              "splitIdle": split_idle, "idleByNode": idle_by_node,
                              "shareTenancyCosts": share_tenancy_costs}
                else:
                    params = {"window": window, "aggregate": aggregate, "accumulate": accumulate, "step": step,
                              "idle": idle, "splitIdle": split_idle, "idleByNode": idle_by_node,
                              "shareTenancyCosts": share_tenancy_costs}

Could you confirm if the aggregate parameter needs to be explicitly defined for the query to succeed? The exporter works fine if we define an AGGREGATION=namespace.

Thanks in advance!

m-parrella commented 3 weeks ago

To provide more context, after deploying the dashboard and configuring the exporter with the parameter AGGREGATION=namespace, we observed that the dashboard displays a summary of the all clusters.

Upon further analysis, it appears that the kubecost-s3-exporter does not apply any filters when querying the Kubecost API on the primary cluster. As a result, in a multi-cluster architecture, it retrieves data from all clusters, not just the intended namespace.

Could you confirm if this deployment approach is compatible with Kubecost version 2.X?

Thank you in advance!

udid-aws commented 2 weeks ago

Regarding the error you're seeing in the cost-analyzer-fronend container logs: I see you're using Kubecost version 2.3.5-rc.10, which seems like a release candidate of v2.3.5 (because of the rc). Checking the Kubecost releases log, it looks like a similar issue was fixed in v2.3.5:

Fix error visible in aggregator logs “append row Failure: acquiring max concurrency semaphore: context canceled” resulting in hung api responses.

Checking also Kubecost's Slack workspace, I see this thread where a user reported the same error as in the Kubecost release page, then when they upgraded to v.2.3.5, they saw the same error as you're seeing (the thread doesn't conclude with a solution to the last error).

I'm using Kubecost v2.4.1 in my setup, and I don't see this error. Can you please upgrade your Kubecost setup to v.2.4.1?

Upon reviewing the Kubecost API documentation, it appears that the aggregate=container parameter is missing from the query. When we manually included this parameter, the API returned 200 OK:

The aggregate query input in the Kubecost Allocation API, isn't mandatory. The reason the kubecost-s3-exporter doesn't use it by default, is because when you don't specify it, Kubecost Allocation API returns the result in container aggregation, and I wanted the dashboard users to have lowest granularity level possible (they can aggregate on higher levels if needed, on the dashboard itself). When not using the aggregate query input, each item in the list is a container, described in the following pattern:

cluster-one/ip-A-B-C-D.ec2.internal/kubecost-eks2/kubecost-eks2-cost-analyzer-xxxxxxxxx-xxxxx/cost-analyzer-frontend

The above pattern keeps each item unique, as it includes the cluster name (as the user defined in Kubecost), node name, namespace, pod name and container name. If I'd use the aggregate query input with container value, it would also return container-level items, but each item on the list will be just the container name. Then, in cases of multiple pods with the same container name (on the same timeframe), they would aggregate into a single item in the API response, which isn't desirable (these are different containers, I'd like to differentiate between them).

The reason I still provide AGGREGATION input in the Helm chart (which, as you can see, is used to populate the aggregate query input in the exporter code), is in cases where container-level aggregation is too low-level in really large setups, such that the Kubecost Allocation API fails to return results. The users can then supply a higher level aggregation level in AGGREGATION input, with the tradeoff of having less granular data.

Upon further analysis, it appears that the kubecost-s3-exporter does not apply any filters when querying the Kubecost API on the primary cluster. As a result, in a multi-cluster architecture, it retrieves data from all clusters, not just the intended namespace.

The kubecost-s3-exporter isn't meant to apply any filters. It runs daily and collects all data in a specific 24-hour timeframe (specifically, 72 hours ago 00:00:00 UTC to 48 hours ago 00:00:00 UTC). If you're in a single-cluster setup, then it'll collect all data for that cluster (in the above-mentioned timeframe). If you're in a multi-cluster setup and the kubecost-s3-exporter collects from the primary cluster, then it'll collect all data for all clusters in the above-mentioned timeframe. If by using the AGGREGATION input as namespace you expected it to collect from a specific namespace, then it won't. This input is meant to define the aggregation level, not a filter.

Could you confirm if this deployment approach is compatible with Kubecost version 2.X?

The kubecost-s3-exporter supports Kubecost 2.x, in any form (single-cluster or multi-cluster Kubecost deployment). In multi-cluster deployment there are a few limitations though, which are documented in the beginning of the README file (see "Kubecost enterprise tier, with the following limitations".

So, to summarize, please upgrade to Kubecost v2.4.1 and check the issue is fixed (after you revert the code back to how it was originally)

m-parrella commented 2 weeks ago

Hi Udi,

Thanks for the clarifications, I suspect the "error querying DB" message is due to the large number of containers. Since the kubecost-s3-exporter queries data from all clusters at once, the Kubecost API struggles to handle the response. This is why the issue was resolved when using AGGREGATION=namespace instead of aggregating by containers.

We are using the Kubecost EKS-optimized bundle in a multi-cluster, multi-account architecture. Based on your previous comment, since kubecost-s3-exporter queries the Kubecost API without filters, data from all clusters is combined into a single file per day in S3. Could you confirm whether the CronJob with the Kubecost S3 Exporter image should be deployed on only one EKS cluster rather than on each cluster?

Finally, assuming the setup is working as expected (aggregating by namespace and only one S3 exporter) when I query the data with Athena, I realized the columns properties.* are empty which is not allowing us to filter by cluster or account on the Dashboard, is this related to the aggregation level?

image

image

Thank you!

udid-aws commented 2 weeks ago

We are using the Kubecost EKS-optimized bundle in a multi-cluster, multi-account architecture. Based on your previous comment, since kubecost-s3-exporter queries the Kubecost API without filters, data from all clusters is combined into a single file per day in S3. Could you confirm whether the CronJob with the Kubecost S3 Exporter image should be deployed on only one EKS cluster rather than on each cluster?

The kubecost-s3-exporter can be deployed also on primary cluster in mutli-cluster Kubecost deployment. I worked with a user who deploy it this way, but of course it's a matter of Kubecost API server performance. It could be that with the other user I worked with, the amount of data that was returned by the primary cluster for all clusters was still small enough for Kubecost API to properly respond, but in your case, it's too large.

Now, in Kubecost multi-cluster deployment, each individual cluster also has Kubecost deployed (the cost-model container still needs to prepare the data), but you just query/use the UI of the primary cluster to see aggregated multi-cluster data in one place. If each individual cluster still has the cost-analzyer-frontend container, then I think the best approach would be to deploy the exporter CronJob in each cluster, like you would do with the Kubecost free tier (although it would mean more deployment effort). To clarify, I don't mean that you change your deployment to multiple single-cluster deployments. I'd just like to check if each individual cluster in your multi-cluster deployment - which must still have Kubecost deployed for data preparation - also has the cost-analyzer-frontend container. If you see the cost-analzyer-frontend container doesn't exist in each individual cluster, then I recall there might be another HTTP service with TCP port 9003 (though I don't know if it returns the same data). First, please check if the cost-analzyer-frontend is present in each individual cluster. If not, check if another service port is used to respond to API calls. You can use the following API call with cURL to check the API (after you port-forward or use ingress to expose):

curl -k -v "http://localhost:9090/model/allocation?window=2024-10-29T00:00:00Z,2024-10-30T00:00:00Z&accumulate=false&step=1d&idle=true&splitIdle=true&idleByNode=true&shareTenancyCosts=true"

Change the protocol (http/https), hostname (localhost), TCP port (9090) and dates as needed.

Finally, assuming the setup is working as expected (aggregating by namespace and only one S3 exporter) when I query the data with Athena, I realized the columns properties.* are empty which is not allowing us to filter by cluster or account on the Dashboard, is this related to the aggregation level?

The empty properties.cluster seems to be because when querying Kubecost Allocation API with aggregate=namespace, it doesn't return this key in the response (for each namespace). I tried it now in my setup (v2.4.1), here's a sample snippet from the output when not using the aggregate query key:

    "properties":
    {
        "cluster": "cluster-one",
        "node": "ip-A-B-C-D.ec2.internal",
        "container": "cost-analyzer-frontend",
        "namespace": "kubecost-eks2",
        "pod": "kubecost-eks2-cost-analyzer-xxxxxxx-xxxx",
        "labels":
        {
            "app": "cost-analyzer",
            "app_kubernetes_io_instance": "kubecost-eks2",

You can see that under the properties key, there's cluster key (and several others). Here's a sample snippet from the output when using aggregate=namespace:

"kubecost-eks2":
{
    "name": "kubecost-eks2",
    "properties":
    {
        "namespace": "kubecost-eks2",
        "labels":
        {
            "eks_amazonaws_com_capacityType": "ON_DEMAND",
            "eks_amazonaws_com_nodegroup_image": "ami-xxxxxxxxxxxxx",

You can see that the only key under properties is namespace (there are others, it's a partial output - but cluster isn't there). While I understand why container and pod keys won't be there under properties (they're in lower level than namespace), I expect node and cluster keys to still be there, as they're higher level than namespace.

I'm not sure why Kubecost Allocation API doesn't return these properties. I logged a ticket with their support (you need to join the Kubecost Slack workspace to see it).

As for the properties.eksclustername empty column, that's a column that the exporter creates, it's not derived from Kubecost Allocation API. You can see the code here. The value of this key is derived from the Helm CLUSTER_ID value, which is expected to take EKS cluster ARN (and I use input validation to make sure it's indeed used as an input). The reason I require it as an input is because in Kubecost free tier deployments, the default cluster name in Kubecost is cluster-one, which is what will appear in properties.cluster key in the Kubecost Allocation API response. Now, I can't enforce the user to change it, and while in multiple single-cluster deployment, repeatable cluster name in multiple single-cluster Kubecost deployments isn't an issue in Kubecost UI/API itself - this would cause an issue in the dashboard, which aggregates data from mulitple clusters. That's why I'm requiring the CLUSTER_ID input (and validate its input), so that in multiple single-cluster setups, users will be forced to use the unique cluster identifier (the cluster ARN), and then the exporter extracts the cluster name for it and use it in the eksclustername, so then in the dashboard there can be a unique name for each cluster (even if the cluster name is the same in each single-cluster Kubecost deployment).

Now, if you haven't changed any of the code related to this input, then this should have worked and the eksclustername column should have shown the primary cluster name. Did you make any changes? Regardless, I think let's leave this column aside because in multi-cluster setup it won't be useful anyway, since the exporter collects data from the primary cluster for all clusters, so all line items in Athena will have the same eksclustername value. The solution is to customize the dashboard so it'll source the cluster name from properties.cluster field instead (and of course, first Kubecost needs to fix the API issue I mentioned above). But let's hold on with that. First, let me know what's your conclusion after you checked if cost-analyzer-frontend container is present in each individual cluster in your multi-cluster Kubecost setup. This will solve both the Kubecost API performance issue (probably, unless some of your clusters are really large) and the same-eksclustername-value issue in multi-cluster setups

m-parrella commented 2 weeks ago

Thanks for the detailed explanation.

We are using a Federated ETL deployment with one Primary cluster and five Secondary clusters. On this setup, the cost-analyzer-frontend container is only deployed on the Primary. This is limited by the official Helm Chart here where it validates if the node is not defined as an agent (agentOnly: true)

Although the frontend is only deployed on the primary, I was able to query directly the cost-model container on the secondary nodes with a small tweak. According to Kubecost documentation, in order to querying the cost-model container directly, the /model part of the URI should be removed. So I adjusted the main.py like this:

Original snippet:

# Executing the API call
logger.info(f"Querying Kubecost Allocation API for data between {start_h} and {end_h} "
                   f"in {granularity.lower()} granularity...")
r = requests.get(f"{kubecost_api_endpoint}/model/allocation", params=params,
                           timeout=(connection_timeout, read_timeout), verify=tls_verify)

Modified snippet:

# Executing the API call
logger.info(f"Querying Kubecost Allocation API for data between {start_h} and {end_h} "
                   f"in {granularity.lower()} granularity...")
r = requests.get(f"{kubecost_api_endpoint}", params=params,
                           timeout=(connection_timeout, read_timeout), verify=tls_verify)

After applying, building and pushing the new image, I deployed one kubecost-s3-exporter on each cluster and adjusted the values.yaml like this (I deployed the exporter on the same namespace as kubecost)

- name: "KUBECOST_API_ENDPOINT"
  value: "http://kubecost-cost-analyzer:9003/allocation"

With this adjustment, each exporter was able to query the model locally aggregating by containers skipping kubecost repose size limitation.

After running the Crawler and the Refresh on the dataset, the data was correctly displayed on the Dashboard; the only columns ampty in Athena are:

properties.node
properties.node_capacity_type
properties.node_architecture
properties.node_os
properties.node_nodegroup_image
properties.providerid

Hope this helps!

udid-aws commented 2 weeks ago

Awesome, thanks! I may include an input in the Helm chart, in one of the next updates, for users to choose if they want to query cost-model or cost-analyzer-frontend (and as a result, the exporter will use the relevant API endpoint).

As for the empty columns:

The missing properties.node and properties.providerid fields seem like a regression from Kubecost 2.x. I checked Kubecost 1.108.1 (latest 1.x version), those properties were present. I then checked Kubecost 2.0.1 (first 2.x version), those properties (and others too) are missing. Finally, I checked with one of the most recent 2.x versions (2.4.1), and same properties are still missing. I tested all of the above with the same API call (same one that is executed by the exporter). So, it looks like a regression that started in Kubecost 2.0.1. I created a ticket, you can follow (you need to join the Kubecost Slack workspace).

The missing properties.node_capacity_type, properties.node_architecture, properties.node_os and properties.node_nodegroup_image fields are derived from node labels, and you need to explicitly set Kubecost to report them in the Allocation API response. Please follow the instructions here

m-parrella commented 2 weeks ago

As a follow-up, after deploying the dashboard, we identified some discrepancies between the costs reported on the Kubecost frontend and in QuickSight.

Currently, pointing the cost model locally bypasses the reconciliation process that occurs on the primary cluster. In our setup, we use Spot Instances and Savings Plans, but this information is not reflected in the dashboard because all data is sourced from the secondary cluster.

For example:

# cat primary.json | jq .data[0] | grep -A73 kubecost-cost-analyzer-654f9d5fd-q2zlh | grep totalCost
    "totalCost": 0.16583,
# cat secondary.json | jq .data[0] | grep -A73 kubecost-cost-analyzer-654f9d5fd-q2zlh | grep totalCost
    "totalCost": 0.29515,

An alternative to sourcing the cost model locally (to pull data from each cluster) would be to point to the primary cluster while using the filterClusters parameter in the Kubecost API requests and adding an additional parameter to the Helm chart.

I’ll test this approach next week and keep you updated!

udid-aws commented 2 weeks ago

Thanks for the update! If you'll be using filterClusters and adding parameter to the Helm chart, I take it that you're going to deploy a separate instance of the exporter on the primary cluster, each one with a different cluster in the fliterCluster input. That's a good idea. I have two other suggestions: