aws-samples / eks-anywhere-addons

https://aws-samples.github.io/eks-anywhere-addons/
MIT No Attribution
21 stars 39 forks source link

Bump Kubecost to v2.0.2 #208

Closed chipzoller closed 4 months ago

chipzoller commented 4 months ago

Signed-off-by: chipzoller chipzoller@gmail.com

Issue #, if available:

Description of changes:

Bumps the Kubecost add-on to version 2.0.0.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

elamaran11 commented 4 months ago

Please validate this too!

chipzoller commented 4 months ago

Fixed and verified with the test job locally.

mikemcd3912 commented 4 months ago

Hi @chipzoller,

Thank you for submitting the update PR. When testing in our vmware environment I am getting an error from the healthtest job with the following output:

k logs -n kubecost kubecost-healthtest-28455820-w2mch
Getting current Kubecost state.
Failed to fetch Kubecost configuration. Response was
sh: out of range

Do you happen to know why this might be unable to fetch the config?

chipzoller commented 4 months ago

Are you deploying with a name other than "kubecost"? I tested this on my end with the changes reflected in this PR and the health test succeeds.

elamaran11 commented 4 months ago

Hi @chipzoller

mikemcd3912 commented 4 months ago

Thanks for your patience while I researched that issue - The naming does seem to be the issue, since we're using flux as our CD mechanism the helm release was deployed with resources that were prefixed with the namespace to avoid collisions, so our service looked like this:

NAME                                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/kubecost-kubecost-cost-analyzer              ClusterIP   10.109.124.79    <none>        9003/TCP,9090/TCP   19m

I think we could restore functionality for our side while maintaining yours with that svc lookup that was in the original test file - is there a reason we moved away from that method with this update?

Also as Ela mentioned we are looking for a more in-depth evaluation of the health of the deployment through functional testing, so we will need the health check itself updated to align with the functional testing requirements before we can approve the PR

chipzoller commented 4 months ago

Can I just ask before addressing comments, is it not acceptable to use Helm tests here? If you look at the chart, there is a test case built in allowing you or any user to simply run helm test <release> after installation which will essentially perform the same validation.

elamaran11 commented 4 months ago

Can I just ask before addressing comments, is it not acceptable to use Helm tests here? If you look at the chart, there is a test case built in allowing you or any user to simply run helm test <release> after installation which will essentially perform the same validation.

@chipzoller Everthing we drive here including functional is driven through GitOps via Flux including functional testing. So the expectation is to submit a Kubernetes CronJob which runs on daily schedule to validate the functionality of the product. Please check the functional design requirements.

chipzoller commented 4 months ago

Fixed tests. They check out on my end.

chipzoller commented 4 months ago

Also as Ela mentioned we are looking for a more in-depth evaluation of the health of the deployment through functional testing, so we will need the health check itself updated to align with the functional testing requirements before we can approve the PR

As far as this goes, it is a functional test and not a health check. I believe we even discussed this in my previous PR. This check is basically an end-to-end smoke test that Kubecost is not only health but that all its dependencies are health. Even though it looks simple, the curl to /model/getConfigs internally traverses all the code paths in order to get a response. The response here is the full operating configuration of Kubecost at that time. If any one image is either down or not returning basic functional info, there will not be a 200 response returned.

If you're saying this still isn't good enough then the third requirement, quoted below, needs significant expansion and clarification.

  1. Healthchecks, service endpoints checks or any other technical checks do not represent sufficient coverage required for the functional test
elamaran11 commented 4 months ago

/model/getConfigs

@chipzoller Now i notice /model/getConfigs which basically qualifies for a functional test. I also remember our conversation on this in previous PR. Im good on this. Lets get the PR tested and move along. On Review looks good.

@mikemcd3912 Lets get it running in our labs and shared feedback to Chris.

mikemcd3912 commented 4 months ago

Working in our environments, No issues noted