bf2fc6cc711aee1a0c2a / architecture

Repository containing the architecture documents.
https://architecture.bf2.dev/
Apache License 2.0
5 stars 20 forks source link

Ap 14 upstream downstream sops repo #57

Closed R-Lawton closed 2 years ago

R-Lawton commented 2 years ago

@david-martin @pepedocs @emmanuelbernard @brizrobbo @jewzaam @tombentley This is the AP created based on ADR 85 if ye get time to have a look that would be great

R-Lawton commented 2 years ago

Rather than spending a lot of time and effort in describing an ideal of what we'd like, I wonder whether a useful step at this point would be to try to move the existing Strimzi-related SOPs to the Strimzi documentation, and watching how that works its way through the Strimzi review process. If it's successful it makes it easier to write this document because we can point to merged PRs, and explain it all more concretely.

This in theory makes a lot sense but I think theres a few points to consider with this approach:

What do ye think ?

tombentley commented 2 years ago

Firstly nearly all if not all of our Strimzi SOPs are very RHOSAK specific they would essentially have to be completely re-written to fit with purely Strimzi docs.

What makes them so RHOSAK-specific? I would have thought that the actions needed to address Kafka-specific things like under-replicated partitions, would be pretty easy RHOSAK-independent.

I completely accept it might not be trivial to move them, and it might need to be done on a SOP-by-SOP basis. And it would probably involve working with the upstream community (in this case I guess @PaulRMellor might be a good point of contact).

Understanding the answers to these questions is kind of central to understanding how easily Strimzi can make use of the operational learnings that RHOSAK is making. It's quite possible that the answer is "not easily", but understanding why is an important part of the process.

Would that mean then RHOSAK upstream SOP repo wouldn't have any Strimzi SOPs in it?

Eventually maybe. Would that be a problem?

R-Lawton commented 2 years ago

What makes them so RHOSAK-specific? I would have thought that the actions needed to address Kafka-specific things like under-replicated partitions, would be pretty easy RHOSAK-independent.

There are kafka specific steps but thats only a small part of the overall SOP. For example in all our SOPs we add RHOSAK specific information like adding extra internal information to the kafka specific steps themselves, internal infrastructure naming, certain prerequisites, descriptions of what the sop is for and extra steps the main users need to preform these SOPs etc.

We wouldnt also be able to move all the strimzi SOPS as we cant forget that up until now the main user of these SOPs has always been our internal engineers to solve issues so they still have to be able to do their job.

I completely accept it might not be trivial to move them, and it might need to be done on a SOP-by-SOP basis. And it would probably involve working with the upstream community (in this case I guess @PaulRMellor might be a good point of contact).

Yes totally agree if we were to move them collaborating with the upstream community makes a lot of sense but going SOP by SOP would start pulling the concern of the leaking information into play again

Understanding the answers to these questions is kind of central to understanding how easily Strimzi can make use of the operational learnings that RHOSAK is making. It's quite possible that the answer is "not easily", but understanding why is an important part of the process.

Yes 100% we are navigating interesting waters when it comes to opensourcing SOPs.

Eventually maybe. Would that be a problem?

We would have a gap for Strimzi as all our SOPs would be in another repo which doesnt really make sense as we are trying to opensource RHOSAK SOPs which Strimzi is a key part of, so a RHOSAK user wouldnt know what to do with issues they are seeing when using RHOSAK.

tombentley commented 2 years ago

What makes them so RHOSAK-specific? I would have thought that the actions needed to address Kafka-specific things like under-replicated partitions, would be pretty easy RHOSAK-independent.

There are kafka specific steps but thats only a small part of the overall SOP. For example in all our SOPs we add RHOSAK specific information like adding extra internal information to the kafka specific steps themselves, internal infrastructure naming, certain prerequisites, descriptions of what the sop is for and extra steps the main users need to preform these SOPs etc.

This is a fair point. I took a closer look at some of the SOPs and I agree that in general it's not clear how, even with restructuring, we might be able to get something usable upstream.

The alert related ones are perhaps more tractable than some of the others though. There's a clear trigger (a specific alert firing) that's more-or-less unrelated to how RHOSAK SRE functions, and there are (at least sometimes) some well-defined actions to be taken, and a clear end point (the alert no longer firing). These are also the SOPs which are most amenable to scripting/automation. So perhaps the question we should be answering is not "How can we share the SOPs with upstream projects like Strimzi", but rather "When building automation for resolving firing alerts in MAS, how can we do that in a way which upstream projects can make best use of".

In other words: It would be easy to go building-out automation which was specific to MAS, but unusable elsewhere. That leaves MAS on the hook for developing, testing, documenting and maintaining it. And yet it will often be highly specific to the workload (Kafka+Strimzi, in the first instance), and so really well-suited to maintaining in whatever community is developing/supporting the operator/orchestration for that workload.

Or maybe I'm looking at this too much through the lens of Kafka+Strimzi, which are towards the complex end of the spectrum of services. If most other services are "Red Hat projects" which (unlike Strimzi) make assumptions about/have dependencies on other services in the stack (e.g. hard dependencies on OpenShift, Prometheus etc), then maybe there isn't so much value to be had. Wdyt @emmanuelbernard?

emmanuelbernard commented 2 years ago

@tombentley my perspective is that you aim is good but you might aim for too far for this evolution step and too abstract to keep the ball rolling. My proposal is to go through with this rev and have subsequently Strimzi + RHOSAK explore how some SOPs could move upstream and find the obstacles. With that feedback we can next rev that architecture pattern. In short, lets go with concrete examples before capturing.

R-Lawton commented 2 years ago

Ok just so im clear, the plan is to continue with this AP with the topic being ADR 85 but in a AP form i.e how a team can implement a model for maintaining an upstream version of SOPs, and a mechanism to consume them downstream from the point of view that its not just Red Hat

emmanuelbernard commented 2 years ago

@R-Lawton What is preventing you at the moment to reach the closing state on this one?

R-Lawton commented 2 years ago

Apologies, im just finishing off my latest version off it. Ill have it finished later today

R-Lawton commented 2 years ago

@emmanuelbernard @brizrobbo are ye also happy for this to be merged?

emmanuelbernard commented 2 years ago

+1 @R-Lawton push any time you want. I can push if you want me to

R-Lawton commented 2 years ago

i dont have write access to merge so you will have to do the honours

R-Lawton commented 2 years ago

i would like @brizrobbo to approve to before we merge

brizrobbo commented 2 years ago

LGTM