envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.95k stars 4.8k forks source link

idea: Non-blocking signal of Nighthawk breakages as part of CI #16886

Open mum4k opened 3 years ago

mum4k commented 3 years ago

Background: Nighthawk is (by design) closely coupled to Envoy internals. A company that uses both has to import both at the same time and resolve any build breakages. Nighthawk has its own set of owners that maintain it and submit updates that catch up with Envoy's changes (e.g. https://github.com/envoyproxy/nighthawk/pull/699).

Problem: Envoy developers making code changes are in the dark when making decisions on whether a piece of code is being used / required. While the Envoy developers aren't required to maintain Nighthawk, having some visibility into how a PR affects the Nighthawk's use case may save developer cycles.

Example: See https://github.com/envoyproxy/envoy/pull/16865. If the Envoy dev who worked on the original PR knew about the Nighthawk use case, they could have decided to add the accessor methods. The end result is the same, but two more developers were involved in resolving the issue.

Proposal for discussion: We could add a non-blocking CI workflow that would just build (i.e. not test to ensure lack of flakes) the Nighthawk repo at each Envoy PR. This would achieve two things:

  1. The Envoy devs would be making informed decisions.
  2. The NH devs would also get an early warning about breakages. To be noted (2) could be achieved by a CI check in Nighthawk's repository, however (1) is considered more valuable here.

Assumptions: We retain the existing split of responsibilities, Envoy devs aren't responsible for and aren't going to fix Nighthawk breakages. This would serve as an optional signal they may take into account when making design decisions.

Tradeoffs:

The confusion could be achieved by a well formed error message with the right documentation.

Please let me know what you think.

/cc @alyssawilk @htuch

htuch commented 3 years ago

I see value here in providing this signal providing NH maintainers are on the hook and responsive to issues here (@mum4k tells me this is the case). As a general principle, I don't think we want an unbounded number of CI jobs tracking projects that depend on Envoy, but Nighthawk is in envoyproxy organization and a key tool that we want to bring into Envoy CI anyway, so this makes sense from that angle.

@envoyproxy/maintainers @phlax thoughts?

phlax commented 3 years ago

not sure exactly about this particular case - but my main reservation would be that we want to avoid adding tests for downstreams if changes in that project are likely to break envoy's CI - otherwise it kinda becomes a dependency for us - but from my limited understanding it may well be a good idea here

mum4k commented 3 years ago

Thank you for starting the discussion. The reservation about downstreams breaking envoy's CI is a good one and should be addressed by the fact that this CI build will be non-blocking (serving as an optional signal only).

That of course leaves us open to seeing the signal report a breakage on a couple of PRs until the Nighthawk maintainers perform the necessary change after a breaking commit. However nothing should be blocked or slowed down, since the signal will be non-blocking.

htuch commented 3 years ago

I think though we do want Nighthawk in Envoy CI in the not too distant future.

mum4k commented 3 years ago

@htuch you did raise a good point. We are planning to finalize Salvo and bring it into Envoy's CI as per the OSS benchmarking of Envoy Requirements in not so distant future.

Once that is done, Nighthawk will become more important to Envoy and its performance. The idea proposed here could very well be used to pave our way towards it.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

mum4k commented 3 years ago

I started working on a prototype and since this will be a pipeline that checks out multiple repositories (Envoy and Nighthawk), it looks like we will need a service connection to Github.

Assuming we are all good to move forward here, can someone indicate who has the necessary admin rights to set this up?

htuch commented 3 years ago

A fair few of us do @mum4k. @lizan might be best as he knows the AZP plumbing, but I can help out if it's just clicking some boxes.

mum4k commented 3 years ago

@lizan your help / advice is appreciated. As noted above, I think that I need a service connection to Github defined in AZP in order to be able to check out the Nighthawk repository.

Could you confirm if this is correct and if so, help me to set one up?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

mum4k commented 3 years ago

Pending time allocation, but will be addressed.

lizan commented 3 years ago

No we don't really need service connection since both repository are public. Check how filter example repo are set up in CI here: https://github.com/envoyproxy/envoy/blob/main/ci/filter_example_setup.sh#L19

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

htuch commented 3 years ago

@mum4k thoughts on whether to label this "Help wanted"? I've assigned it your way in the interim.

mum4k commented 3 years ago

Thank you @htuch. Labeling this as "Help wanted" is a good idea, I would be happy to assist someone who might be willing and able to pick this up before we get to it.