Closed teemow closed 1 year ago
@puja108 I suggest that we start a conversation with Rainbow and Honey Badger about this. I'd like to focus more on what the platform teams want to build with our platform and get away from cluster creation itself. Some of this is maybe just the way we present things in our docs and UIs. But we definitely also need to build some additional tooling and automation. For now I'd like to develop a common target image of how this could look like.
For inspiration. @Rotfuks shared this video of Qovery in Slack: https://youtu.be/EznBV1km580
Fyi @marians @weatherhog @gianfranco-l @piontec
Some ideas from my side, since I already put a lot of thought into a PaaS Solution for Devs in my last Product:
I think we really have a good base here to deliver such a platform experience for consumers of the clusters in the future.
I always wanted to have an experience for devs that would in its most radical form be just 2 simple input field and a button: All you need is an Origin (where you can find what you want to deploy - Registry or Repository), a Target (where you want to access your deployed app/service - this can be a app/service name for the wildcard or directly uploading custom certs) and then a "deploy" button to deploy it.
You can then extend this experience with a lot of additional modules like allowing in an extended options view to change the default configuration of the cluster you deploy in, env-variables and secret management, defining a branch and trigger policy for continuous deployments, selecting default apps in a marketplace (app platform) you want to have preinstalled in your cluster, directly ordering custom domains with automated cert management within the deployment form and much more.
To achieve this in a nice way the platform has some key components. First of there is a great benefit in keeping everything modular - since we can then in the future more flexible exchange individual components with better suiting or more modern tools or ideas (like introducing Webpacks or other containerless solutions in the future) In general this platform would need the following bits and pieces:
Core:
Extended:
So yeah, some basic ideas. I also have some scribbles in my head, if you want it visualised, but I guess this overview already gives a rough idea. We were a bit further in the process of finding the right tools for the modules, but I think this has to be defined in our GS context, especially since we already have - as said - a good base for most of the bits here.
IMHO there's two main jobs of the platform team. One is towards their day to day and their higher up stakeholders the other towards their customers the end user developer teams. You see them similar to strategic goals (see PO training).
1 encompasses stories like cluster templates, RBAC and app bootstraps, etc and goes towards making their life's easier as the people responsible for keeping things up and uptodate. Here we need to go away from single cluster management towards fleets, automation, governance,... I think on this one our knowledge and relationship to current platform teams gives us a good picture of where we could be.
2 is more in the line of what @teemow mentioned above and goes towards levels of abstraction that can be built in. Ideas in this space should go towards clusters as cattle. The hardest part here will be individualities of each customer. We might need to build a very flexible kind of abstraction, something that a platform team can mold into what resembles their relationship and control with their customers best. Shopify has a very nice abstraction internally, where their developers do not even know which cluster they are on and don't need to and the platform team can flexibly switch out clusters. Still, on this one I would like us to do more discovery with customers and on the market (see Qovery for example) before we form a strong opinion of our own.
Sure, these two are related, however, treating them separately will help us prioritize work towards the different goals. I would keep this issue towards goal 2 as there's already a lot of thoughts in that direction. I would make it more specific though and name it in the direction of what we had last year with "self-service developer platform" to point more clearly that here we aim to create something for the platform team but geared towards developer teams.
I'd create another one towards the fleet, automation, governance goal, where the focus is rather on the other day to day jobs of a platform team that are "behind the scene" but still a pain point for platform teams.
I'd like to also talk with the team on which one we should prioritize first with regards to current company goals and challenges.
I would even go a step further and say, that a platform with the target audience of the platform teams of our customers (so the k8s cluster fleet itself) and a platform for consumers - the actual application developers (so the enriching of the k8s clusters with a delivery service, buildpacks, etc) should be two different products - so with that two different teams (if not even domains). I believe having a "core" domain with all base infrastructure streams like networking, machine management, storage and so on plus a domain with all developer experience streams like happa, the clis and a more self-service dev-centered platform as discussed here with good API-Contracts between the two domains would be the best approach to not run into priortisation issues or have teams that are simply too big.
I like Puja's approach. For me, "step 1" is all about giving tools to the platform teams, so they can easily "orchestrate" them: configure, bundle and "export" towards their users, while being able to easily provide support for their app teams. My hunch is that this should be first priority, because unhappy platform teams won't help us, even if app teams are happy.
2nd step is to make sure that app teams can have an easier life as well. From my perspective (in cloud native app dev's shoes), this means mainly:
2nd step depends to some extent on the 1st one: ie, I can't easily configure monitoring for my app if there's no monitoring platform/tooling ready.
What I'm wondering about in the context above is: how we can make this better/different than heroku or cloud foundry? What will be our "killer feature" that makes us a better solution?
I think that the above question is super important: we all know that 'big guys' are getting better and better at providing managed kubernetes and that at some point in time "just" being a managed kubernetes offering won't make it. And providing platform like that, a "higher level feature" seems the best way. But again, this seems crucial to me and requires some serious thinkign.
CC @gianfranco-l
@piontec the thing is that the platform teams want to build a platform. So as long as we only work on fleet management / governance etc we will not necessarily make their lives easier. Their first prio is to have a platform. So it is not first 1 and then 2.
@Rotfuks I agree with what you said. And we have these two areas already. KaaS and Platform. The fleet management is within KaaS and the platform is more developer/app centric.
But in general we are still adapting and adjusting the goals and the "ownership" of both areas. The platform teams have helped a lot within KaaS as the effort to switch to CAPI was high. The KaaS teams weren't able to think much ahead in terms of fleet management. And I like @puja108 idea to think about it roughly as platform team => kaas, developers => platform. As you see we might need to rename again :smile:. Also we are dog fooding a lot (clusters are apps, KaaS basically using the platform too).
@piontec we are about to create a releng team for KaaS which means that Honey Badger doesnt have to think much about KaaS anymore. This should give you a bit more headspace to think about the developers. I think that Kubernetes and the cloud-native ecosystem as a foundation is a different game than heroku or cloud foundry. You can come with sane defaults and still configure and change a lot. So you can build very powerful platforms. And we've built a large part of this already. Now we need to think about the user experience. How can we simplify this? How can we make code to production easy? This is exactly the job of a platform team.
I think one of the killer features will be "kubernetes underneath". I love the Cloud Foundry experience, but first setting it up as a company is hell! You will definitely love having a CF around as a Dev, but you will hate it as a platform team. I always thought it is on a lot of levels way to bloated. Plus moving away from Diego (or at least from parts of Diego) to Kubernetes is not working as smoothly as the community expected and even Pivotal has change it's production instances back again away from cf-for-k8s. In general Cloud Foundry (as well as my few experiences with Heroku go) are way to restrictive in what they allow and what they don't. You simply can't do much with your Cloud Foundry app space, especially since it's only shared VM instances (which might as well be a security issue for some). Having Kubernetes underneath and just working with defaults that you can override when you feel confident enough (and opt out of 24/7 gs platform support :D) - just giving you the full k8s experience when you want it, but wrapping it in a veeeery easy to learn and low startup hurdle is a good selling point. You could say, our PaaS is a K8S but in easy entry mode. Like any good game: Easy to learn, hard to master.
@teemow I never said that 2 requires 1, just that there's a partial dependency.
As for the rest, I think my idea is mostly the same as yours. For me, the "correct set of killer features" (personal PoV) is:
We already know projects and tools that we can use to make it happen, the big missing piece is something that can turn a repo of source code into that bundle automatically :)
@puja108 lets work with Rainbow and Honey Badger on a joint vision around Golden Path and UX for the future platform together. Let's do a workshop to align the vision and do some product discovery together.
@gianfranco-l we talked about this yesterday.
@puja108 lets clean this up too
We talked about all those things in the Honeybadger/Horizon sprint and the results went into this issue: https://github.com/giantswarm/roadmap/issues/2792
We've had a lot of discussions and miros about the developer platform. See https://github.com/giantswarm/giantswarm/issues/22978
We now want to create a first concrete Golden Path for the developer platform. There is a new WG for this: https://github.com/giantswarm/giantswarm/issues/27799
The plan is to do a sprint before the onsite and the outcome will be documented here.
-- Old description
At the moment we focus on things like cluster and (platform-) applications in our UIs. Which means that a platform team of a customer can create a cluster and install applications on that cluster.
The job of the platform team is to provide a platform that allows their development teams to deploy and operate their own code/services/applications. So if we want to make this easier for them we would need to help them achieving this with our automation. A cluster or a managed app is only an implementation detail or a building block.
Goal: What would be a higher level abstraction that could help our customers to have an easier interface to achieve their goal?
Examples:
team
to the management api and the team automatically gets its own namespace on a cluster. But not only that but also a way to hook up their git repositories to a pipeline that deploys on this cluster, dashboards to see their own applications and the cluster health etc.