:bulb: [Feature] Test platform upgrade path instead of fresh install

bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform

https://birdhouse-deploy.readthedocs.io/en/latest/

Apache License 2.0

4 stars 7 forks source link

:bulb: [Feature] Test platform upgrade path instead of fresh install #459

Open huard opened 5 months ago

huard commented 5 months ago

Description

By definition, stakeholders in the birdhouse-deploy platform have a running instance with existing users. The CI however tests the deployment of a fresh install. I understand having two separate test suites would be too complex, but I'd like to discuss the idea of making the CI run a platform upgrade instead of a fresh install, since this is the default use-case.

Can this be done with Jenkins ?
How would it work in practice ?
What kind of efforts are necessary ?
What would be lost by not testing fresh installs anymore ?

fmigneault commented 5 months ago

The CI would have to run as a two-step operation.

Deploy the current master.
Attempt update to the branch and run compose up script + tests.

I think this would not be that complicated to implement. An option to Jenkins CI could make it checkout to master first, run the usual monitoring until the instance is "ready", and rerun the same steps with checkout <branch> and "ready".

It wouldn't be a "fresh" install, but not far from it. I don't think we could reliably test a live server, as any PR could break it, and we would need to handle "downgrade" path as well to revert changes.

Maybe running tests on the "upgraded" server could also be skipped (most probably redundant from the tests ran on the specific PR?). Once the "compose up" operation succeeded, whether from a fresh or upgraded server, the running services should respond the same.

mishaschwartz commented 5 months ago

I think we should consider what we want these tests to check and how we might populate the master instance before we run the update so that all these things can be tested. Some ideas off the top of my head are to check that:

data can still be served by thredds, geoserver
users can still access data/files in their workspace
magpie permission have not changed unexpectedly
running jupyterlab containers still work as expected

fmigneault commented 5 months ago

Many of those cases are covered already by notebooks in https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/tree/master/notebooks-auth

These tests define permissions on data and workspaces accessible through GeoServer, THREDDS and Jupyter, as well as some permission sync by Cowbird, and make sure they are still accessible (or not according to expected behavior) after modifying the permissions. Since the data is prepared in advance by the https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/test-cowbird-jupyter-access, https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/testthredds, https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/test-geoserver-secured-access, etc., I feel they already cover the situation of "upgrade path" with a server that already has some data/permission definitions.

Would there be other ways to test them further?

tlvu commented 5 months ago

I think we should consider what we want these tests to check and how we might populate the master instance before we run the update so that all these things can be tested. Some ideas off the top of my head are to check that:
* data can still be served by thredds, geoserver

Current Jenkins default enabled notebooks already test for that.

* users can still access data/files in their workspace

Do you mean their JupyterLab workspace? If yes, then currently no test cover this since it's an interactive web ui style of test.

* magpie permission have not changed unexpectedly

Current Jenkins default enabled notebooks should (not 100% sure) already test for that.

* running jupyterlab containers still work as expected

This is an interactive web ui style of test, we do not have this yet.

If we are going to add interactive web ui testing, I'd like to have those additional tests as well:

tutorial notebooks are deployed and the deployment tree is as expected
users can share notebooks with other users
and test the various JupyterLab plugins we have, (some example: git plugins can clone, archive plugin can download a whole folder and unpack uploaded archives, the monitoring plugin still display the ram usage, bokeh performance has not changed, ...)

fmigneault commented 5 months ago

running jupyterlab containers still work as expected

This is evaluated by https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/master/notebooks-auth/test_cowbird_jupyter.ipynb (see last cell).

Accessing the files served by those instances are not tested because of the mentioned UI that makes test harder to maintain (though should be possible).

tlvu commented 5 months ago

Many of those cases are covered already by notebooks in https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/tree/master/notebooks-auth

These tests define permissions on data and workspaces accessible through GeoServer, THREDDS and Jupyter, as well as some permission sync by Cowbird, and make sure they are still accessible (or not according to expected behavior) after modifying the permissions. Since the data is prepared in advance by the https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/test-cowbird-jupyter-access, https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/testthredds, https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/test-geoserver-secured-access, etc., I feel they already cover the situation of "upgrade path"

I think these tests create users and test them in one go. Which explains why it did not catch the incompatibility problem of Cowbird with existing Magpie users before Cowbird is enabled, as in a real "upgrade path" on a production system.

To properly cover the "upgrade path", the system must first be deployed on a version A, then users and data are added, test they work, then upgrade to a version B, and test again all users and data access still working as when at the initial version A.

fmigneault commented 5 months ago

I think these tests create users and test them in one go. Which explains why it did not catch the incompatibility problem of Cowbird with existing Magpie users before Cowbird is enabled, as in a real "upgrade path" on a production system.

Good point. A pre-generated test user by config before compose up could be tested as well for the same operations as the ones covered by the test notebook.