boxboat / aks-health-check

A client-side tool to perform automated checks against an AKS cluster to see if it follows best-practices.
Mozilla Public License 2.0
74 stars 15 forks source link

Checks order #64

Open PixelRobots opened 3 years ago

PixelRobots commented 3 years ago

I think CSP should be the first check then DEV. It makes sense to me to get the cluster right before the apps.

Thoughts?

fgauna12 commented 3 years ago

Organizationally, I can see that. But, I want to make sure, is there a problem you're also looking to solve?

PixelRobots commented 3 years ago

No problem to solve as such. Just thinking as a customer or as someone that could use this tool for their customers and when generating the report document I feel it would be better presented with the infrastructure first.

It's sort of the home to the rest in a way.

Just a thought mind.

fgauna12 commented 3 years ago

Okay good. I like it too. It will take a bit of work. I'm going to leave this open for now.

PixelRobots commented 3 years ago

Thanks. If it's ok with you I could knock up a draft of the order and maybe move some of the checks around for your review.

fgauna12 commented 3 years ago

That would be great and really appreciated.

PixelRobots commented 3 years ago

Awesome. I will start working on it tomorrow.

PixelRobots commented 3 years ago

Here is the current order:

Check ID Manual/Automated Description
DEV-1 Automated Implement a proper liveness probe
DEV-2 Automated Implement a proper readiness/startup probe
DEV-3 Automated Implement a proper prestop hook
DEV-4 Automated Run more than one replica for your deployments
DEV-5 Automated Apply tags to all resources
DEV-6 Automated Implement autoscaling of your applications
DEV-7 Automated Store secrets in azure key vault
DEV-8 Automated Implement pod identity
DEV-9 Automated Use kubernetes namespaces
DEV-10 Automated Setup resource requests and limits on containers
DEV-11 Automated Specify security context for pods or containers
DEV-12 Manual Configure pod disruption budgets
IMG-1 Manual Define image security best practices
IMG-2 Manual Scan container images during CI/CD pipelines
IMG-3 Automated Allow pulling containers only from allowed registries
IMG-4 Automated Enable runtime security for containerized applications
IMG-5 Automated Configure image pull RBAC for azure container registry
IMG-6 Automated Isolate azure container registries
IMG-7 Manual Utilize minimal base images
IMG-8 Automated Forbid the use of privileged containers
CSP-1 Manual Logically isolate the cluster
CSP-2 Automated Isolate the Kubernetes control plane
CSP-3 Automated Enable Azure AD integration
CSP-4 Automated Enable cluster autoscaling
CSP-5 Manual Ensure nodes are correctly sized
CSP-6 Manual Create a process for base image updates
CSP-7 Automated Ensure the Kubernetes dashboard is not installed
CSP-8 Automated Use system and user node pools
CSP-9 Automated Enable Azure Policy
CSP-10 Automated Enable Azure RBAC
DR-1 Manual Ensure you can perform a whitespace deployment
DR-2 Automated Use availability zones for node pools
DR-3 Manual Plan for a multi-region deployment
DR-4 Manual Use Azure traffic manager for cross-region traffic
DR-5 Automated Create a storage migration plan
DR-6 Automated Guarantee SLA for the master control plane
DR-7 Manual Container registry has geo-replication
STOR-1 Manual Choose the right storage type
STOR-2 Manual Size nodes for storage needs
STOR-3 Manual Dynamically provision volumes when applicable
STOR-4 Manual Secure and back up your data
STOR-5 Manual Remove service state from inside containers
NET-1 Manual Choose an appropriate network model
NET-2 Manual Plan IP addressing carefully
NET-3 Manual Distribute ingress traffic
NET-4 Manual Secure exposed endpoints with a Web Application Firewall (WAF)
NET-5 Manual Don’t expose ingress on public internet if not necessary
NET-6 Manual Control traffic flow with network policies
NET-7 Manual Route egress traffic through a firewall
NET-8 Manual Do not expose worker nodes to public internet
NET-9 Automated Utilize a service mesh (optional)
NET-10 Manual Configure distributed tracing
CSM-1 Manual Keep Kubernetes version up to date
CSM-2 Manual Keep nodes up to date and patched
CSM-3 Manual Monitor cluster security using Azure Security Center
CSM-4 Manual Provision a log aggregation tool
CSM-5 Manual Enable master node logs
CSM-6 Manual Collect metrics

Here is the proposed order. I have also changed some names like tag is now labels, as it makes more sense with Azure.

Check ID Manual/Automated Description
CSP-1 Manual Logically isolate the cluster
CSP-2 Automated Isolate the Kubernetes control plane
CSP-3 Automated Enable Azure AD integration
CSP-4 Automated Enable cluster autoscaling
CSP-5 Manual Ensure nodes are correctly sized
CSP-6 Manual Create a process for base image updates
CSP-7 Automated Ensure the Kubernetes dashboard is not installed
CSP-8 Automated Use system and user node pools
CSP-9 Automated Enable Azure Policy
CSP-10 Automated Enable Azure RBAC
CSP-11 Automated Use availability zones for node pools
CSP-12 Automated Guarantee SLA for the master control plane
CSP-13 Manual Enable Container Insights
CSP-14 Manual Enable Azure Defender for AKS
NET-1 Manual Choose an appropriate network model
NET-2 Manual Plan IP addressing carefully
NET-3 Manual Distribute ingress traffic
NET-4 Manual Secure exposed endpoints with a Web Application Firewall (WAF)
NET-5 Manual Don’t expose ingress on public internet if not necessary
NET-6 Manual Control traffic flow with network policies
NET-7 Manual Route egress traffic through a firewall
NET-8 Manual Do not expose worker nodes to public internet
NET-9 Automated Utilize a service mesh (optional)
NET-10 Manual Configure distributed tracing
DEV-1 Automated Implement a proper liveness probe
DEV-2 Automated Implement a proper readiness/startup probe
DEV-3 Automated Implement a proper prestop hook
DEV-4 Automated Run more than one replica for your deployments
DEV-5 Automated Apply labels to all objects
DEV-6 Automated Implement autoscaling of your applications
DEV-7 Automated Store secrets in azure key vault
DEV-8 Automated Implement pod identity
DEV-9 Automated Use kubernetes namespaces
DEV-10 Automated Setup resource requests and limits on containers
DEV-11 Automated Specify security context for pods or containers
DEV-12 Manual Configure pod disruption budgets
IMG-1 Manual Define image security best practices
IMG-2 Manual Scan container images during CI/CD pipelines
IMG-3 Automated Allow pulling containers only from allowed registries
IMG-4 Automated Enable runtime security for containerized applications
IMG-5 Automated Configure image pull RBAC for azure container registry
IMG-6 Automated Isolate azure container registries
IMG-7 Manual Utilize minimal base images
IMG-8 Automated Forbid the use of privileged containers
STOR-1 Manual Choose the right storage type
STOR-2 Manual Size nodes for storage needs
STOR-3 Manual Dynamically provision volumes when applicable
STOR-4 Manual Secure and back up your data
STOR-5 Manual Remove service state from inside containers
DR-1 Manual Ensure you can perform a whitespace deployment
DR-2 Manual Plan for a multi-region deployment
DR-3 Manual Use Azure traffic manager for cross-region traffic
DR-4 Automated Create a storage migration plan
DR-5 Manual Container registry has geo-replication
CSM-1 Manual Keep Kubernetes version up to date
CSM-2 Manual Keep nodes up to date and patched
CSM-3 Manual Enable master node logs
CSM-4 Manual Collect metrics

Let me know your thoughts.

fgauna12 commented 2 years ago

@PixelRobots - Sorry about this, this went stale. If you want to pursue this, we can re-kindle this issue. Just let us know, otherwise, we can close it.

PixelRobots commented 2 years ago

Hey, yeah happy to pursue it again. Love the tool and happy to give it some love.

fgauna12 commented 2 years ago

@PixelRobots - I'm making the change to the doc. Do you want to try re-arranging the check order in the code? If not, we're happy to do it, just wanted to give you the opportunity.

fgauna12 commented 2 years ago

Also, @PixelRobots could we keep the change scoped to only the re-ordering of sections? So, excluding the re-classification of some of the checks. For example, some checks like DR-2 to CSP-11 will be a breaking change.

We will need to experiment with having different versions of the document that we can keep in sync with release versions of the companion health check tool.