Closed mhausenblas closed 4 years ago
Excited to see this coming out!
@mhausenblas is the idea to hand build each resource going forward using kubebuilder
or is there plans make it more maintainable long term with code generation? This was one of the pitfalls that I found with the current version it didn't scale by building more and more by hand.
For use cases make sure to look through the existing issues, you'll see a lot of folks have written out what they see as important services.
Do you have any concerns with it being the only code base and ending up in a monorepo style like Kubernetes is where we're starting to break out resources into discrete components to help maintain them better given ownership?
What does the commitment look like from the service team side? Will they be developing controllers? Will other teams be contributing?
š
Thanks @christopherhein and happy to see your continued interest and support for ASO!
is the idea to hand build each resource going forward using kubebuilder or is there plans make it more maintainable long term with code generation?
The goal is to automate as much as possible, enabling individual service teams to own their parts.
For use cases make sure to look through the existing issues, you'll see a lot of folks have written out what they see as important services.
Yup, we did, now asking for more/new insights based on experiences so far ;)
Do you have any concerns with it being the only code base and ending up in a monorepo style like Kubernetes is where we're starting to break out resources into discrete components to help maintain them better given ownership?
I don't have a strong opinion on this at the current point in time.
What does the commitment look like from the service team side? Will they be developing controllers? Will other teams be contributing?
Details to follow, but yes, see above, the goal is to enable other service teams.
Thanks, again, and hope we can benefit from your expertise, going forward.
- ASO strives to be the only codebase exposing AWS services via a Kubernetes operator.
@mhausenblas Awesome news! Do you happen to know what impact, if any, this will have on the https://github.com/awslabs/aws-servicebroker project?
As a general comment on all resources, it would be great if we had some way to define IAM role(s) with differing permissions, or maybe some kind of iamRoleRef
that would allow you to add policies/permissions to an existing role that was created by an IAMRole
CRD. The latter is likely much more flexible.
The use case for that would be that as it stands with the operator now, there is no way to create IAM roles or add policies to existing ones, so you would either need to modify a role manually in order to use kube2iam/kiam
, or you would end up needing to give the node where the containers are running way too many privs.
@colinhoglund I have unfortunately no information available on the Service Broker project.
@dcherman agreed and it was no coincidence I referenced #23 in this context ;)
Once we have the new repo, keep an eye out for the respective issue, but in a nutshell, yes that's the idea.
@jaymccon
I think IAM roles would be one pf the most useful features. However, I feel like an operator that makes IAM. roles can be as dangerous as it is convenient so a lot of thought would have to go into how to lock that operator down in case an attacker was able to get onto that pod.
ECR Repos with the ability to create and attach policies to them would be very useful as well.
Private link services and endpoints for exposing services from inside our EKS clusters to on premises would be very useful (I'm building one right now for my company).
One challenge that we have run into with AWS and creating resources with operators is how to expose the ARNs or names of the resources we create back to other devs who have access to specific namespaces in the cluster but no access to AWS console.
Being able to manage S3 buckets, and RDS including IAM roles for accessing those would be great.
I imagine dealing with all IAM could be difficult, though exposing only the ones needed for the specific resources should be more feasible.
Excited seeing this becoming public and would love to contribute with docs, code, especially test coverage/automation, and use cases.
Services to be prioritized from my perspective/experience:
I agree with others that IAM roles/policies is probably the most important thing to get right, though perhaps this needs to be coordinated with kube2iam.
Ideally a CRD would exist to create IAM roles with all policies with a mechanism to reference other AWS resources created by ASO CRDs, and kube2iam could then introduce a new annotation or format to reference an IAM role by its CRD name rather than the ARN of the created role.
e.g.
apiVersion: service-operator.aws/v1alpha1
kind: IAMRole
metadata:
name: ServiceRole
spec:
# AssumeRolePolicyDocument would be auto-generated along with role name and path.
policies:
- name: Access CRDS3Bucker
effect: Allow
action: ['s3:PutObject','s3:GetObject','s3:DeleteObject']
resource:
- crd: {s3: NameOfAnS3BucketCRD }
---
apiVersion: v1
kind: Pod
metadata:
name: Some Pod
annotations:
kube2iam/role-crd: ServiceRole
spec: # Pod Spec
As time permits would be interested to contribute code to make this happen.
We have been using the existing ASO for a few months now. After testing out the service broker approach more than a year ago.
The addition of kube2iam in our cluster has not been a problem for us, though tighter integration with IAM via something like #23 would be great if it removes the need for kube2iam entirely.
We run a custom built ASO image in our clusters that extends the available services and we appreciated how relatively easy it was to do this once we understood how to extend the existing models. The model <> cfn template mapping currently used makes it nice took a little work to learn how to extend. Ease of extensibility is important to us for whatever future design is used.
We have added our own models to support the following:
In terms of ordering for officially supported resources, RDS, DocumentDB and Lambda would be high on the list I'd think.
One challenge we found is secret handling. We would very much like to see some integration with EKS such that secrets could easily be ref'd somehow, even possibly backed by SSM parameters.
Would love to see the following services supported:
We are deploying at least one new application in our kubernetes production by week, if we could directly create the AWS resources needed within our helm chart it would be a real game changer.
The main resources needed are:
@Spareo wrote:
an operator that makes IAM. roles can be as dangerous as it is convenient so a lot of thought would have to go into how to lock that operator down in case an attacker was able to get onto that pod.
If the AWS service operator had a built-in understanding of IAM permissions boundaries, it could create and manage roles with a suitable boundary applied, whilst running as an IAM role that only allows creating roles that have a whitelisted boundary in place.
That takes some work to get right. It's a nice approach that mitigates some of the risks from the operator's Pod(s) or Secret(s) getting compromised.
RDS, ElasticSearch, DynamoDB
We'd like to see support for:
IAM, Route 53, Cognito, S3, EFS, RDS, Neptune
DynamoDB, Kinesis, IAM, Elasticcache (Redis specifically), s3 and slightly less useful RDS for us
RDS, S3, Elasticache, and MSK
API Gateway
RDS, Elasticache, S3, Lambda, Step Functions
@mhausenblas
My colleagues and I spent the last year writing/running our own aws-controller
project built around Kubebuilder v1. We have support for:
IAMRole
)S3Bucket
)DynamoDbTable
)DynamoDbGlobalTable
)SNSTopic
)SNSTopicSubscription
)SQSQueue
)ElastiCacheInstance
)RDSInstance
)All of these have Create
support and some, to a degree, have Update
support. We deliberately held off on adding Delete
logic because, at the time of writing, finalizers
weren't supported by Kubebuilder. It's on our list of TODO items :stuck_out_tongue:
Our AWS Technical Account Manager had advised us that ASO was a work-in-progress at the time, but our timeline meant that we couldn't wait for ASO to come out, so we forged ahead with our own controller. We eventually reviewed ASO and found that we preferred the Kubebuilder based aws-controller
project though. I had eventually wanted to open-source it, but it would require a hefty re-write and sign-off from the higher-ups to release into the wild.
My observations with our Kubebuilder-based aws-controller
so far are:
Come up with a concrete contract for transforming from CRDs to AWS API calls early on. We wrote our S3Bucket
controller first and DynamoDbTable
controller last. The DynamoDbTable
controller has a much better way of handling CRUD operations as a result of the knowledge gained.
We ended up with helper functions on our CRDs like S3BucketSpec.getCreateBucketInput()
and S3BucketSpec.getDeleteBucketInput()
that returned &s3.CreateBucketInput{}
or &s3.DeleteBucketInput{}
objects that can be passed directly to the API from Reconcile()
.
We had limited success with Update
logic. That usually becomes very CRD-specific functionality in the Reconcile()
function. Where possible we would try to make helper functions on the CRD structs again (e.g. S3BucketSpec.getPutBucketPolicyInput()
or S3BucketSpec.getPutCorsInput()
)
Use Open Policy Agent and write policies to block people creating IAMRole
with god-mode access and make sure it's in your CI/CD pipeline to validate this stuff.
I argue strongly that it's not the controllers responsibility to manage RBAC or who can do what with CRDs, that should be OPA and/or cluster RBAC. Definitely don't try to tackle this in the controller, it's not worth the hassle.
finalizers
(we haven't yet and as such we can't Delete
anything)One of my projects over the next few months is to update our code to Kubebuilder v2. If there's enough traction on this rewrite/restructure, perhaps I can devote my time to this project rather than maintaining our own internal controller that benefits nobody else?
Thanks a lot for sharing your feedback, advices, and experiences @daviddyball, very much appreciated!
If there's enough traction on this rewrite/restructure, perhaps I can devote my time to this project rather than maintaining our own internal controller that benefits nobody else?
That would be fantastic.
that's great news, this cannot come soon enough.
My main requirement is that this controller be able to run outside AWS infrastructure. Meaning not tied to EKS or EC2 instances. I had to do a super minor patch to ASO to get it to run in GKE:
https://github.com/awslabs/aws-service-operator/commit/9e775d1c767192d37e81a5f53ef9485769a13f43
Then IAM, S3, SQS, Kinesis and Lambda.
We have been using our own implementation of something similar: https://github.com/Collaborne/kubernetes-aws-resource-service
Based on our usage:
Rather than writing operators for each AWS service API would it be better to transform k8s CRD to CloudFormation micro-stacks and leverage existing CloudFormation capability to handle updates intelligently?
AWS would need to be willing to lift the lid on CF stacks, to allow 1000ās of micro stacks, rather than the current AWS assumption that people will use only a few monolithic stacks.
@whereisaaron thanks for your feedback! CF should be treated as an implementation detail and we want to abstract it. Once the new repo is available under aws
GitHub org, which should be very soon, we can continue the discussion there. We, that is @jaypipes and myself, will create dedicated issues for Kubebuilder, IAM handling, etc. in the new repo and that's the best place to have this convo.
I'd personally vote against using CF to implement any of the controller logic. To me using the APIs directly for controller implementation is a much cleaner approach. Like you say @mhausenblas, I'm sure this can be discussed in the new repo once it's available.
Any way to get a notification when the repo becomes available?
@daviddyball thanks for your feedback as well and indeed we'll make sure to announce it here on this issue when the repo is in place, so if you're subscribed here that should be sufficient.
Thanks @mhausenblas! My CF comment was in relation to reviewing the implementations @ankon and @seboga offered. They look useful and real examples of an AWS Operator like that we are discussing here. But looking at them it occurred to me that handling creates, updates, rollbacks, and clean deletions with dependencies, across dozens or hundreds of APIs, is non-trivial.
CF and Terraform achieve this, tracking API changes for the APIs discussed here, with varying amount of lag. If AWS wants to repeat this effort for the cleaner approach - as @daviddyball suggested - then great! But already AWS CF can't keep up with AWS API changes, so it worries me that a separate AWS Operator implementation might suffer similar delays. Is AWS willing to commit those resources to maintain the proposed AWS Operator, when already CF appears very under-resourced for tracking those same API's?
When you do get to the implementation decisions stage, the ability to keep that implementation up to date with the AWS APIs should also be considered as part of those implementation details. No point deciding that X would be the perfect implementation choice, if that choice can't be maintained.
@whereisaaron my understanding of the operator pattern is that it's level-based, so there's no state tracking. Every time Reconcile()
is called it's the controllers job to go and query the current "state of the land" and then decide whether it needs to make any further changes to match the incoming K8s object. Relying on something state-based like CF or Terraform goes against this as it becomes mandatory to track the state.
With regard to the feasibility of maintaining API compatibility, it'd be open-source, so anyone can commit changes and fixes if API's change over time. Other projects like Boto seem to manage, so I can't see it being an issue here (unless there is no community uptake)
Good points @daviddyball š
How do people feel about a native integration with terraform for handling the CRUD operations? I agree with @whereisaaron that maintaining independent lifecycle code is a massive undertaking.
@daviddyball Two huge reasons terraform keeps track of state are so that runs dont blow through rate limits and so that runs can make attributes about an object exportable to other runs. If I create an s3 bucket I need that ARN to be exported to my IAM role or to my config service. If this project becomes exclusively a stateless operator this would require every reference to refresh the state... Very quickly this would run into api limits.
Since terraform 0.12 and above are fully json compatible it would be rather simple to put the objects into the CRD and have the operator hook into the already maintained terraform provider (eg. the 2500 lines of code it takes to manage an s3 bucket https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_s3_bucket.go)
The rancher team has done something quite similar with their experimental controller https://github.com/rancher/terraform-controller
once https://github.com/hashicorp/terraform/pull/19525 is merged state can be fully managed inside of kubernetes as well.
@cpoole I dislike it greatly.
@whereisaaron and yourself make great points about the cost-benefit trade-offs of leveraging state:
...but using Terraform as the implementation feels misappropriated for use by project in my opinion. Rancher's Terraform Controller already looks like the solution for that specific implementation approach and I would hate to see work repetition and overlap here on this project.
Generally as a user of CloudFormation (CFn) for years with more recently familiarity with the Kubernetes operator and controller patterns I do mostly agree with @daviddyball's points.
I think (or at least optimistically believe) that the potential contributor scale benefits of open-sourcing a project like this provide the key differentiator to the closed-source CFn team with respect to closing coverage gaps seen today between CFn and underlying AWS API updates.
I also think you could solve some of the rate limiting and resource discovery concerns with a lighter quorum or runtime cache pattern that you see prevalent in many other parts of the Kubernetes architecture and operators/controllers that is not as heavy or critical as Terraform's durable state. The excellent AWS ALB Ingress Controller project for example builds a cache state model at startup and maintains that for the purposes of minimizing API operations on each reconciliation loop when unnecessary. It rebuilds/recovers this cache if it dies/restarts which greatly reduces the operational overhead of having to store and protect state somewhere else be it StatefulSets or an external object store/database. This is one of the greatest advantages CFn provides over Terraform today in my opinion.
I'm no expert, but maybe there are already some excellent RESTful API CRUD controller Go libraries that could be core to this project and DRY out the repetition that would otherwise be required for every AWS service API that is implemented against?
Awesome discussion so far overall!
The open source code in, eg, https://github.com/terraform-providers/terraform-provider-aws is definitely available for study or reuse. I think that's the kind of sharing I'd like to see.
If anyone wants to use https://github.com/rancher/terraform-controller then I am happy for them. That's different from how I'd imagine an AWS-specific service Operator. Similarly if there's an Azure or whatever Operator.
@iAnomaly good points regarding caching at the controller level to reduce API calls. I've seen it done with the kiam
controller for IAM-specific resources. Maybe that's something we can look into to reduce API cost.
We're already running our controller in production across 6 clusters in 4 regions and we're not hitting API rate limits (or if we are they are easily absorbed by the exponential back-off of the Reconcile()
function.
I do agree that API limits are worth taking into account when designing this, I just don't agree that using CloudFormation or Terraform is the right way to fix it. To each their own though.
FYI: the new repo is now available via aws/aws-service-operator-k8s
and I'd like to invite everyone to have look at the design issues and contribute there, going forward.
mhausenblas on Aug 30, 2019
10 months AWS said they were starting work on the next AWS Operator but there are still no versions available.
Is there someone still working on it ? Even, did someone already work on it and began to code it ?
thank you @ddseapy for pointing this, so there is hope!
There is hope indeed! Please see here: https://github.com/aws/aws-service-operator-k8s/tree/mvp
We're working on it, targeting some initial services in an MVP release at end of this quarter.
Happy to announce that AWS Controllers for Kubernetes is in Developer Preview with support for S3, Amazon SNS, Amazon SQS, DynamoDB, Amazon ECR, and AWS API Gateway.
You can learn more about the project on our project site.
We are closing this issue. Please comment or contribute directly on the [ACK GitHub project](). You can also see our service controller support roadmap on the project.
Congrats folks! š Awesome achievement!
/cc @tabern @jaypipes @mhausenblas
Many of you are aware of the AWS Service Operator (ASO) project we launched a year ago. We reviewed the setup with existing stakeholders and decided to turn it into a first-tier OSS project with concrete commitments from the service team side, based on the following tenets:
Going forward, we will archive the existing
awslabs/aws-service-operator
repo and create a new one under theaws
organization, with the goal to re-use as much of the existing code base as possible. In this context, we will introduce the high-level changes, including re-platforming on Kubebuilder, which should help lower the barrier to contribute, and clarifying the access control setup (see also #23).At the current point in time, we'd like to gather feedback from you concerning:
I'm super excited about what has been achieved so far thanks to @christopherhein's excellent bootstrapping work and looking forward to take the next steps, together with you.
===
UPDATE@2020-07-17: we renamed the project to AWS Controllers for Kubernetes (ACK).
UPDATE@2020-08-19: ACK is now available as a developer preview