Open PixelRobots opened 3 years ago
Organizationally, I can see that. But, I want to make sure, is there a problem you're also looking to solve?
No problem to solve as such. Just thinking as a customer or as someone that could use this tool for their customers and when generating the report document I feel it would be better presented with the infrastructure first.
It's sort of the home to the rest in a way.
Just a thought mind.
Okay good. I like it too. It will take a bit of work. I'm going to leave this open for now.
Thanks. If it's ok with you I could knock up a draft of the order and maybe move some of the checks around for your review.
That would be great and really appreciated.
Awesome. I will start working on it tomorrow.
Here is the current order:
Check ID | Manual/Automated | Description |
---|---|---|
DEV-1 |
Automated | Implement a proper liveness probe |
DEV-2 |
Automated | Implement a proper readiness/startup probe |
DEV-3 |
Automated | Implement a proper prestop hook |
DEV-4 |
Automated | Run more than one replica for your deployments |
DEV-5 |
Automated | Apply tags to all resources |
DEV-6 |
Automated | Implement autoscaling of your applications |
DEV-7 |
Automated | Store secrets in azure key vault |
DEV-8 |
Automated | Implement pod identity |
DEV-9 |
Automated | Use kubernetes namespaces |
DEV-10 |
Automated | Setup resource requests and limits on containers |
DEV-11 |
Automated | Specify security context for pods or containers |
DEV-12 |
Manual | Configure pod disruption budgets |
IMG-1 |
Manual | Define image security best practices |
IMG-2 |
Manual | Scan container images during CI/CD pipelines |
IMG-3 |
Automated | Allow pulling containers only from allowed registries |
IMG-4 |
Automated | Enable runtime security for containerized applications |
IMG-5 |
Automated | Configure image pull RBAC for azure container registry |
IMG-6 |
Automated | Isolate azure container registries |
IMG-7 |
Manual | Utilize minimal base images |
IMG-8 |
Automated | Forbid the use of privileged containers |
CSP-1 |
Manual | Logically isolate the cluster |
CSP-2 |
Automated | Isolate the Kubernetes control plane |
CSP-3 |
Automated | Enable Azure AD integration |
CSP-4 |
Automated | Enable cluster autoscaling |
CSP-5 |
Manual | Ensure nodes are correctly sized |
CSP-6 |
Manual | Create a process for base image updates |
CSP-7 |
Automated | Ensure the Kubernetes dashboard is not installed |
CSP-8 |
Automated | Use system and user node pools |
CSP-9 |
Automated | Enable Azure Policy |
CSP-10 |
Automated | Enable Azure RBAC |
DR-1 |
Manual | Ensure you can perform a whitespace deployment |
DR-2 |
Automated | Use availability zones for node pools |
DR-3 |
Manual | Plan for a multi-region deployment |
DR-4 |
Manual | Use Azure traffic manager for cross-region traffic |
DR-5 |
Automated | Create a storage migration plan |
DR-6 |
Automated | Guarantee SLA for the master control plane |
DR-7 |
Manual | Container registry has geo-replication |
STOR-1 |
Manual | Choose the right storage type |
STOR-2 |
Manual | Size nodes for storage needs |
STOR-3 |
Manual | Dynamically provision volumes when applicable |
STOR-4 |
Manual | Secure and back up your data |
STOR-5 |
Manual | Remove service state from inside containers |
NET-1 |
Manual | Choose an appropriate network model |
NET-2 |
Manual | Plan IP addressing carefully |
NET-3 |
Manual | Distribute ingress traffic |
NET-4 |
Manual | Secure exposed endpoints with a Web Application Firewall (WAF) |
NET-5 |
Manual | Don’t expose ingress on public internet if not necessary |
NET-6 |
Manual | Control traffic flow with network policies |
NET-7 |
Manual | Route egress traffic through a firewall |
NET-8 |
Manual | Do not expose worker nodes to public internet |
NET-9 |
Automated | Utilize a service mesh (optional) |
NET-10 |
Manual | Configure distributed tracing |
CSM-1 |
Manual | Keep Kubernetes version up to date |
CSM-2 |
Manual | Keep nodes up to date and patched |
CSM-3 |
Manual | Monitor cluster security using Azure Security Center |
CSM-4 |
Manual | Provision a log aggregation tool |
CSM-5 |
Manual | Enable master node logs |
CSM-6 |
Manual | Collect metrics |
Here is the proposed order. I have also changed some names like tag is now labels, as it makes more sense with Azure.
Check ID | Manual/Automated | Description |
---|---|---|
CSP-1 |
Manual | Logically isolate the cluster |
CSP-2 |
Automated | Isolate the Kubernetes control plane |
CSP-3 |
Automated | Enable Azure AD integration |
CSP-4 |
Automated | Enable cluster autoscaling |
CSP-5 |
Manual | Ensure nodes are correctly sized |
CSP-6 |
Manual | Create a process for base image updates |
CSP-7 |
Automated | Ensure the Kubernetes dashboard is not installed |
CSP-8 |
Automated | Use system and user node pools |
CSP-9 |
Automated | Enable Azure Policy |
CSP-10 |
Automated | Enable Azure RBAC |
CSP-11 |
Automated | Use availability zones for node pools |
CSP-12 |
Automated | Guarantee SLA for the master control plane |
CSP-13 |
Manual | Enable Container Insights |
CSP-14 |
Manual | Enable Azure Defender for AKS |
NET-1 |
Manual | Choose an appropriate network model |
NET-2 |
Manual | Plan IP addressing carefully |
NET-3 |
Manual | Distribute ingress traffic |
NET-4 |
Manual | Secure exposed endpoints with a Web Application Firewall (WAF) |
NET-5 |
Manual | Don’t expose ingress on public internet if not necessary |
NET-6 |
Manual | Control traffic flow with network policies |
NET-7 |
Manual | Route egress traffic through a firewall |
NET-8 |
Manual | Do not expose worker nodes to public internet |
NET-9 |
Automated | Utilize a service mesh (optional) |
NET-10 |
Manual | Configure distributed tracing |
DEV-1 |
Automated | Implement a proper liveness probe |
DEV-2 |
Automated | Implement a proper readiness/startup probe |
DEV-3 |
Automated | Implement a proper prestop hook |
DEV-4 |
Automated | Run more than one replica for your deployments |
DEV-5 |
Automated | Apply labels to all objects |
DEV-6 |
Automated | Implement autoscaling of your applications |
DEV-7 |
Automated | Store secrets in azure key vault |
DEV-8 |
Automated | Implement pod identity |
DEV-9 |
Automated | Use kubernetes namespaces |
DEV-10 |
Automated | Setup resource requests and limits on containers |
DEV-11 |
Automated | Specify security context for pods or containers |
DEV-12 |
Manual | Configure pod disruption budgets |
IMG-1 |
Manual | Define image security best practices |
IMG-2 |
Manual | Scan container images during CI/CD pipelines |
IMG-3 |
Automated | Allow pulling containers only from allowed registries |
IMG-4 |
Automated | Enable runtime security for containerized applications |
IMG-5 |
Automated | Configure image pull RBAC for azure container registry |
IMG-6 |
Automated | Isolate azure container registries |
IMG-7 |
Manual | Utilize minimal base images |
IMG-8 |
Automated | Forbid the use of privileged containers |
STOR-1 |
Manual | Choose the right storage type |
STOR-2 |
Manual | Size nodes for storage needs |
STOR-3 |
Manual | Dynamically provision volumes when applicable |
STOR-4 |
Manual | Secure and back up your data |
STOR-5 |
Manual | Remove service state from inside containers |
DR-1 |
Manual | Ensure you can perform a whitespace deployment |
DR-2 |
Manual | Plan for a multi-region deployment |
DR-3 |
Manual | Use Azure traffic manager for cross-region traffic |
DR-4 |
Automated | Create a storage migration plan |
DR-5 |
Manual | Container registry has geo-replication |
CSM-1 |
Manual | Keep Kubernetes version up to date |
CSM-2 |
Manual | Keep nodes up to date and patched |
CSM-3 |
Manual | Enable master node logs |
CSM-4 |
Manual | Collect metrics |
Let me know your thoughts.
@PixelRobots - Sorry about this, this went stale. If you want to pursue this, we can re-kindle this issue. Just let us know, otherwise, we can close it.
Hey, yeah happy to pursue it again. Love the tool and happy to give it some love.
@PixelRobots - I'm making the change to the doc. Do you want to try re-arranging the check order in the code? If not, we're happy to do it, just wanted to give you the opportunity.
Also, @PixelRobots could we keep the change scoped to only the re-ordering of sections? So, excluding the re-classification of some of the checks. For example, some checks like DR-2
to CSP-11
will be a breaking change.
We will need to experiment with having different versions of the document that we can keep in sync with release versions of the companion health check tool.
I think CSP should be the first check then DEV. It makes sense to me to get the cluster right before the apps.
Thoughts?