freedomofpress / securedrop

GitHub repository for the SecureDrop whistleblower platform. Do not submit tips here!
https://securedrop.org/
Other
3.62k stars 686 forks source link

Test upgrade path in CI #1689

Open redshiftzero opened 7 years ago

redshiftzero commented 7 years ago

Related to #1681: it would be really great (at a future point) for CI to be catching issues that are introduced that do not break new installs but do break upgrades on existing instances.


Discussed on 2018-05-27 between @msheiny, @conorsch, @redshiftzero and @eloquence. Agreed upon initial scoping for this epic is as follows:

To be scoped further depending on the above:

msheiny commented 7 years ago

So I'm really thinking we need this as part of 0.4 release - we can time-box it but I think its worth trying to bring in. In essence, testing upgrades in particular, is a huge time-drain and its going to hurt more when we try to particularly test nuances of the jump between ansible versions and tails versions.

(taking some of this convo from chat) I see testing strategy for this issue as a two part problem:

  1. ensure users are able to upgrade tails from 2 -> 3 without breaking their ability to reach their servers and perform upgrades.
  2. ensure we do not break the server when running a new playbook against a 0.3.12 server

Re: 1 - Client-side testing for this scenario is going to be a pain in the ass to try and automate. At least as the docs currently describe it (create a backup tails stick, upgrade from 2 -> 3, test web/ssh/ansible access still exists). We can do particularly pieces of it but overall i think this is best left as a manual check in QA. :(

Re: 2 -- This should be much easier to do but still has a few hairy moving pieces. High-level workflow I see:

Need to figure out how much time this takes to run and the best place to run in the pipeline depending on that answer. The more I type this out I realize its going to probably be a 2 week debug + implementation process.... maybe that isnt a good fit for pre-0.4 after-all.... anyways.. I need more tea.

tl;dr - I'm confused about whether adding this to 0.4 is a good idea or not. Glad I typed it out in a bunch of coherent sentences :taco: :tada: :bike:

msheiny commented 6 years ago

index

msheiny commented 6 years ago

Sooo @dachary brought up some really good points out of band in chat that I need to paste here. The gist was that he requested that what we run in CI should match completely what can be run by developers.

This is a valid request and usually in the past I've always tried to make CI as close to what's run locally by developers. The challenge with SD in particular though is that SD is designed for physical hardware, our kernels are explicitly not enabling certain VM guest features that make it break on clouds like AWS, and more importantly the cloud providers we work with so far do not offer nested virtualization. The last part is the biggest issue - having nested virtualization would allow us to use the same workflow and spin-up logic as developers.

So @dachary and I specifically started talking about the ability to run nested virtualization in public clouds. I've done a little research of the big three (note, i've stricken digitalocean from this since the do not have the ability to tightly scope API creds as far as I know):

We have credits for Azure + AWS. Most of our code experience is with AWS though I've started to play with Azure a little. Folks on the team have some GCloud experience.. and there are enough ansible modules... So it would probably be fine. Theres definitely a spin-up cost though.

Anyways... I'm fine with putting in this effort and I think it's worthwhile (it would probably be a lot more stable to provision to be honest) BUT it should be understood there is a time cost from adding this support. So I'm not sure with those changes if I can hit the release date target for 0.6 with this added scope.

msheiny commented 6 years ago

Okay so it sounds like we are going to pivot this ticket slightly and aim for running under nested virtualizaiton. Going to break this into two tickets:

msheiny commented 6 years ago

As a new direction for this ticket and to scope the work, I propose the following high-level goals:

[omitted]

(We discussed this on 2018-05-17. Agreed upon scoping moved to top of ticket for visibility. -- @eloquence)