Closed GeorgianaElena closed 1 year ago
FYI, I don't think it's actually possible to programmatically create GitHub OAuth Apps from any sort of command line/REST API call - that is why we didn't broadly advertise the Teams/Orgs auth feature for a long time because it inherently requires manual setup and we wanted to judge how intrusive it was before 'going public'. However, I don't think it's that intrusive and it is such a useful feature (especially with teams-based restrictions on profiles) that I think it's worth eating the manual setup cost.
In general, I'm cautious about getting the deployer to automatically create everything for us. If the steps are "input a command -> use the output of a command", there's not a lot of motivation to learn what the command does. This is why I found the "create cluster" files section of the AWS setup process to be confusing. I trust the jsonnet files that the deployer spits out to be "correct", and if when I run them something is wrong, I don't have a good starting position to begin debugging. This is because the deployer "does magic" that I am not informed about. Only the person who wrote that part of the deployer knows about that magic.
I'd honestly just be happy with a folder of template files (tfvars, helm chart values, etc) that I hand copy and hand edit. Rather than increasing the complexity of the deployer to "do magic".
I also think about this from a Right to Replicate perspective. The deployer is a tool to help us, but is too over-powered for someone not in our engineering team to use. The manual steps are important because that is how a hub admin would setup the hub without 2i2c involvement. I would prefer us to get our docs up to a state that the manual steps "flow" (which was my intention with the Hub Deployment Guide) and then work that up to some R2R docs that wouldn't need the deployer at all.
This is a super important discussion that I think will benefit from some sync time. I added this one as a topic for the next week's Prod and Eng meeting on Tue 22nd.
I agree that blindly trusting the output of a program is hard, as when it breaks it is very difficult to see why. I do think however, that @GeorgianaElena's work in https://github.com/2i2c-org/infrastructure/pull/1903 with GCP is actually a lot better than my current work with the jsonnet files - primarily because they are generating tfvar files that are providing terraform
variables that are documented in variables.tf
and aren't doing a lot of magic, while jsonnet is instead a few more layers of magic (translated into yaml, then used by eksctl to do random things that require a deeper understanding of EKS). I think the problem there is more our use of eksctl, which isn't integrated with terraform, rather than the generator itself.
I have opened https://github.com/2i2c-org/infrastructure/issues/1924 to get rid of the jsonnet. I believe even if we had copy pasted the jsonnet from a template, it wouldn't have made much of a difference in understanding, and we should really get rid of it :) And I agree we should keep an eye on making sure there isn't a lot of magic in the deployer generate commands, and it could always be manually generated too. This will also help a lot with the right to replicate parts, as once a .tfvars
or .common.yaml
is generated, the fact that it was generated and not copy pasted ceases to matter. I think we should get rid of all the config generation we do in the deployer
for this reason.
Either way, I hope @GeorgianaElena's work in https://github.com/2i2c-org/infrastructure/issues/1924 can continue :)
I also opened https://github.com/2i2c-org/infrastructure/issues/1925 to remove all the helm related magic in our deployer, so it functions equivalent to a helm upgrade
command passed a list of appropriate yaml
files.
Another option to consider (but perhaps not block #1924 on) is to use https://github.com/cookiecutter/cookiecutter instead of writing our own.
Finally, I think being able to consistently stand up a new hub in under 1h of human work is an awesome goal to shoot for in terms of reducing our own toil, and increasing automation there without falling prey to magic that we don't know how to fix when broken is definitely doable!
Right, I just want us to recognise that the relationship between automation and efficiency is not necessarily a linear one, and they're not even the only two variables in the equation. We can improve the situation in other ways as well as making the deployer do certain steps for us.
I'm happy to see in #1903 the decision was made to have template files that the deployer copies, renames, moves, reads, and writes. I was worried that these templates would become embedded in the Python files as mega-strings and would therefore reduce findability of the templates, especially for those who don't regularly use the deployer. (I don't even think the engineering team regularly inspects the deployer code, so I don't want us embedding knowledge there.)
This implementation detail alone reduces my concerns a lot.
Yay, so glad to hear @sgibson91 :) I agree completely that keeping it as files and outside python is very important. That is also how the current aws jsonnet generator works - https://github.com/2i2c-org/infrastructure/blob/master/eksctl/template.jsonnet is the file being used as the source, with some template generation.
Hopefully with https://github.com/2i2c-org/infrastructure/issues/1925 we'll remove all of the hub config that's embedded in the deployer.
I added this one as a topic for the next week's Prod and Eng meeting on Tue 22nd.
I removed it from the meeting agenda because I believe we have an agreement and a path forward I really like!!
We discussed making this issue a more fine-grained one (probably as part of the new goals for Q1).
@2i2c-org/engineering, I updated the top comment with some more concrete action points and a categorization of the tasks. Some of the tasks, esp the ones related to creating templates of some of the files and integrating those in the deployer might need more discussion in separate issues that are not yet created.
I believe the next tasks are now to:
If it is helpful Im happy to discuss and share my initial experiences/expectations re deployer as a new engineer. There are definite things that have come to mind in the half a day I've been using it and am sure to have more!
That would be extremely helpful @pnasrat! Do you want to open an issue to sketch these ideas and then have a sync chat about it, the other way around, or how do you prefer?
We could also use the Product and engineering meeting to discuss this if you'd like. There's one every Tuesday https://compass.2i2c.org/en/latest/reference/calendar.html, including today and the agenda is here.
This was an issue tracking a quarterly goal from some time ago. Some tasks have been done, some are still todo, but since most of them are tracked into their own issues, I will close this issue.
I. Context
Now that the
deployer
has been refactored https://github.com/2i2c-org/infrastructure/pull/1869, building on top of it has become more scalable.We are currently using the
deployer
to automatize various manual tasks related to hub deployment and management and with each option added to it, the engineer team have become more efficient 👉🏼 https://github.com/2i2c-org/infrastructure/blob/master/deployer/README.mdThe deployer is a tool to help us, but is too over-powered for someone not in our engineering team to use. The manual steps are important because that is how a hub admin would setup the hub without 2i2c involvement, from a Right to Replicate perspective. (credit goes to Sarah for signal boosting this aspect)
Personal experience
After deploying two similar clusters and hubs on aws, I manage to build some muscle memory and the time it took to deploy the second one was cut in half. But this still meant a day, which in my opinion is a lot.
I believe a significant amount of time was spent on:
II. Proposal
The
deployer
is ready for a new round of enhancements, maintenance and documentation in order to help us become more efficient at deploying new clusters and hubs, but also form the r2r perspective to remove any config generation out of it, while creating and maintaining a clear documentation of the deploy workflow.deployer
III. Specific action points mapped to the three categories above
1. ENHANCEMENTS
deployer.generate_cluster
deployer.generate_hub
Automatize the manual UI tasks that we're doing right now
2. MAINTENANCE and R2R
3. DOCUMENTATION