aws-samples / eks-anywhere-addons

https://aws-samples.github.io/eks-anywhere-addons/
MIT No Attribution
20 stars 40 forks source link

Adding Sysdig helm onboarding to EKS-A snow #23

Closed manuelbcd closed 1 year ago

manuelbcd commented 1 year ago

Description of changes: Adding Sysdig helm onboarding to EKS-A snow (In progress)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

manuelbcd commented 1 year ago

@shapirov103 please note that I'm still working on it. I submit the PR just to make it easier to get your comments in context. Thanks!

manuelbcd commented 1 year ago

@elamaran11 @shapirov103 FYI

manuelbcd commented 1 year ago

I'm a bit lost on how to proceed with the testjob. The testjob.yaml sample is doing nothing but deploying a busybox container to the cluster but I'm not sure on how to test a helm addon (suggestion: it would be great to have a testjob example related with the kubecost or botkube addons) @elamaran11

elamaran11 commented 1 year ago

I'm a bit lost on how to proceed with the testjob. The testjob.yaml sample is doing nothing but deploying a busybox container to the cluster but I'm not sure on how to test a helm addon (suggestion: it would be great to have a testjob example related with the kubecost or botkube addons) @elamaran11

Hi @manuelbcd The testjob.yaml is a sample provided to show how a test job can be created and yes we used to show the same via busybox container. So the test job is not intended to test the helm addon installation because that will be verified when the GitOps sync happens and we have daily reporting in our CICD process to check that. The main intent of test job is to functional test the functional aspect (functionality) of your ISV product, whether the ISV product is functionally working fine in the K8s cluster. If you already have any functional tests for your ISV product, please containerize that and submit as a job that can run on the target cluster. Please let me know if you have any further questions.

shapirov103 commented 1 year ago

@manuelbcd do you mind remove .DS_Store files from the PR? I see you added them to .gitignore, but this PR is not taking advantage of it yet, so the files show up.

elamaran11 commented 1 year ago

@manuelbcd Except for the .DS_Store file and missing Functional Test job, rest looks good on the PR.

elamaran11 commented 1 year ago

@manuelbcd Here is a ticket for Functional Test job - https://github.com/aws-samples/eks-anywhere-addons/issues/27

shapirov103 commented 1 year ago

@manuelbcd I observe that technical validation of SysDig deployment is failing at the moment with a few pods pending:

 sysdig-agent-6fthk                          ●       1/1                      0 Running        10.0.0.214        s-i-8bad74b9e13b35e5c        42m         │
│ sysdig-agent-9x2gm                          ●       0/0                      0 Pending        n/a               n/a                          42m         │
│ sysdig-agent-bgvq6                          ●       0/0                      0 Pending        n/a               n/a                          42m         │
│ sysdig-agent-mm9gg                          ●       1/1                      0 Running        10.0.0.101        s-i-88e94fa614f36e9b2        42m         │
│ sysdig-agent-p8zhh                          ●       0/0                      0 Pending        n/a               n/a                          42m         │
│ sysdig-agent-q8rxd                          ●       1/1                      0 Running        10.0.0.81         s-i-8b086ffa3525e0543        42m         │
│ sysdig-kspmcollector-759d74589f-zl4bq       ●       1/1                      0 Running        10.1.4.52         s-i-8b086ffa3525e0543        42m         │
│ sysdig-node-analyzer-4ms54                  ●       3/3                      0 Running        10.0.0.81         s-i-8b086ffa3525e0543        42m         │
│ sysdig-node-analyzer-b82vg                  ●       3/3                      0 Running        10.0.0.43         s-i-84960b24e1180cd93        42m         │
│ sysdig-node-analyzer-jxcg2                  ●       3/3                      0 Running        10.0.0.139        s-i-853e1770cd5d27696        42m         │
│ sysdig-node-analyzer-khmjq                  ●       3/3                      0 Running        10.0.0.101        s-i-88e94fa614f36e9b2        42m         │
│ sysdig-node-analyzer-n7p4n                  ●       0/0                      0 Pending        n/a               n/a                          42m         │
│ sysdig-node-analyzer-r7qdj                  ●       3/3                      0 Running        10.0.0.79         s-i-8403823df04217822        42m

I am unclear on the actual reason, but potentially the issue could be addressed by limiting the daemonset to worker nodes only. Worker nodes are marked with the following label on snow: cluster.x-k8s.io/owner-kind: MachineSet.

The GitOps PR could be modified to use that as a selector potentially.

manuelbcd commented 1 year ago

Removed agent tolerations from the helm parameters to avoid the execution of the agent in control-plane nodes