Proposal: Refactoring how we build workspaces for each environment.

gchang commented 3 years ago

@marjo-luc @bsatoriu @sujen1412

This proposal has a few steps. If we're okay with any/all of it, I'll make this tix into an epic.

1) Build stacks in their respective environments Right now, we only have one Gitlab instance that builds Che stacks, and that's the ops one. It also builds it into the ops registry. This makes it hard if we're testing new stacks. The new proposal is to build dit/dev stacks in the dit/dev gitlab, uat stacks in the uat gitlab, and ops stacks in the ops gitlab. Each environment will have the ade-base-images project that does the CI build and pushes into the respective registries. The code itself will be the exact same git repo. Meaning that we're just pushing the same code to three remote locations. In order to keep things separate, we should use Gitlab-wide CI variables to determine what environment we're in and what URLs to use. For example, in the .gitlab-ci.yml, it's hardcoded to push to the ops registry. That can be replace with gitlab variables that are defined globally, per environment (At `Admin Area > Settings > CI/CD > Variables).

2) Create a stacks Gitlab groups to "own" the stacks. Currently, the stack image is "registryurl/root/jupyter_image:vanilla". The root part isn't descriptive. We should make it so that it's something like "registryurl/stacks/jupyter_image:vanilla". Or it could be maapstacks or nasastacks...

3) Separate out each stack into its own repo. Right now, everything is a branch in the ade-base-images projects. Since we have the stacks group, we can use that to encapsulate multiple repos, each which define a particular stack. So for example, vanilla and plant would each be a separate repository. This would make it easier for us to version control each one. And also...

4) Consider how to automatically scrape the Gitlab repo/registry for stacks and autopopulate the Che 7 devfile registry. With all stacks in a Gitlab group (i.e. stacks), we can start thinking about how to discover new stacks that are committed there and build out the devfile registry automatically. If users want to create their own stack, they can fork one of the existing stack configurations and modify it to their needs. The scraper can follow the project's forks and discover user created stacks.

bsatoriu commented 3 years ago

Great idea--this refactoring would be a nice step forward in giving more control and transparency to MAAP users.

With regard to stack 'readiness', we may consider how we want to distinguish between released (fully tested) stacks and experimental stacks. One way to address this may be to include the author of the stack as part of the definition or description, so any MAAP user can quickly see if the stack is authored by MAAP (i.e., a 'first party' stack), or a MAAP user. Similarly, we may want to include the repo tag/versioning info as part of the stack metadata.

The other concern is stack visibility. If a user creates a new stack to test, we may want to hide that stack from the broader user group until the author chooses to make the stack visible, along the lines of how we're handling EFS folders. This kind of ancillary visibility data could be stored in the MAAP API.

mayadebellis commented 3 years ago

lots of questions/comments/ideas

In part 1 you mention replicating the ade-base-image repo across the environments but in part 3 you mention getting rid of it. If we replaced it with individual base image repos would those all have to get replicated across environments?

I am totally in favor of seperating the images in different repositories and pushing to different registries but unclear how the code management will work. Managing duplicated code in 3 environments sounds like it will quickly lead to things becoming out of sync. Which environment should the code live in? For user created stacks I assume they will only live in ops so maybe that's where the code lives for all of them? Not sure.

It would be nice if we could eliminate the jupyter-image repo I think or at least move out the Dockerfile so it becomes only the CI scripts. Now that we have eliminated a lot of the secrets that were stored in gitlab it should be able to move to the github maap-jupyter-ide repo. It will take some refactoring of the Dockerfile and the CI scripts but I think it will make things easier to manage.

I imagine this looking like maap-jupyter-ide triggering a build in jupyter-image (or rename to maap-jupyter-ide since it is now just that) in one of the environments based on what branch triggered it (master, test, develop, or from a stable or version tag). Then each of the jupyter-imagebuilds could have a different list of base images they build from based on the environment. And maybe in this CI we auto create a devfile if it doesn't already exists??

I have more questions about that our version control will look like on the base images - currently we have none basically and are always just building to latest. Also I am curious how user created base images will work here. Creating the final jupyter image when they update the stack will be straight forward. But when there is a jupyter update, we currently rebuild the jupyter stuff on every base images. Right now we only have 4 images we build, but with user images this could become a long list and and a very long job. This may need some rethinking or way to split this up into multiple jobs to track failures better.

I think it would be helpful to have a call with everyone involved and map out what we want things to look like. There is a lot of moving parts here and would be helpful to brainstorm with everyone before making the decisions!

marjo-luc commented 2 years ago

This is a small portion of the updates, but I think it's best to add it here:

We are converting custom jupyter extensions into npm packages, which can then be installed using the standard npm i <package> command within the dockerfiles when building the environments. This decouples the extensions from the workspace, making the architecture more module, and allowing greater flexibility as to what is being installed on workspaces.

Each extension in the maap-jupyter-ide will eventually be converted into its own npm package and made available from the @maap-jupyter organization on the npm public registry. The packages follow the standard versioning paradigm (major.minor.patch) with a distribution tag identifying the jupyterlab version the extension should be installed on. The distribution tag could be assigned to newer versions of the extensions as they become ready for use in ops. For example:

Screen Shot 2022-03-07 at 6.23.29 PM.png

The installation command in the dockerfile for this package would be npm i package@jupyterlab_v3. This will help limit the number of changes made to dockerfiles as we would no longer need to update dockerfiles whenever we need to update a package version -- we can just update the distribution tag to point to a different version. If we ever upgrade to jupyterlab v4, we can use this approach to tag versions as well, giving users the confidence they are installing extensions that are compatible with their jupyterlab versions without having an intimate knowledge of the various versions for all the extensions.

marjo-luc commented 1 year ago

Summarizing a discussion we had on 1/25/23 regarding versioning and workflow changes:

This discussion has two main facets: versioning workspaces and developing workflows we want to use across our repos to facilitate this.

The workflow we have somewhat been using involves having three branches -- with each branch tied to a venue (DIT, UAT, OPS). We push code to DIT and merge those changes into UAT and OPS. The docker images are built from DIT only and copied over to UAT and OPS when appropriate.

I had proposed a release-based workflow where a single branch would be tagged with something such as rc-* (release candidate), as @anilnatha had recommended, to indicate code readiness for testing in the test venue and something such as r-* to indicate code readiness for the production venue. Tagging would trigger github actions to deploy binaries to the appropriate venue.

I recommend this workflow for the following reasons:

Repo structure should avoid coupling to venues to maintain code and deployment venue modularity.
This will avoid inconsistent branch behavior (we would not have one branch that builds binaries and other branches that only copy binaries to different venues)
By convention, repo branches capture a different state of the code. The existing method of merging changes from develop to test and production branches implies these branches will all -- nominally speaking -- reflect the same code state. While not harmful, this adds an unnecessary complexity that could otherwise be resolved through tagging.
This would make it easier to enforce a single, unidirectional workflow, reducing risk when making deployments regardless the contribution type. For example, hotfixes would be handled in the development venue built from the relevant tagged release, then deployed to test and production.
The release-based workflow is commonly used and requires minimal training for new users.

We ultimately chose the three-branch solution so that our repo organization would reflect the distinct venues we use and allow users to view what is running on each venue without having to refer to release numbers.

Proposed Workflow:

In nominal cases, repos will have three branches: develop, test, and main or master.

Nominal workflow:

Developers will develop and test locally from a branch built off develop
then create a PR back to develop when ready to test in the dev venue (DIT).
Merging a PR into develop triggers the binary build and deployment to the dev venue.
Once developers finish testing in dev, the repo maintainer (most likely) will issue a PR from develop to test.
A merge from develop to test will trigger the copying of the binary built from the develop branch to the test venue (UAT).
The tester tests the deployment in UAT and once done,
the repo maintainer issues a PR from test to main/master.
This merge triggers copying the binary from the test venue to the production venue.

Note: the onus is on developers to delete their branches upon merging to develop.

This workflow may not apply to the custom jupyter extensions as these are published as standalone, versioned npm/python packages that are then installed when building the binaries.

Proposed Workspace Versioning:

Tools and services will follow the vMajor.Minor.Patch format. Incrementation guidelines are as follows:

Major - breaking/incompatible changes Minor - adding/removing functionality that is backwards compatible Path - fixing bugs that are backwards compatible

Proposed Branch Naming Convention:

[CATEGORY]/[SHORT-DESCRIPTION]

Where CATEGORY is one of the four contribution types and SHORT-DESCRIPTION is a short description of the contribution.

Contribution types fall under one of the four following categories: • Feature - adding/changing/removing feature • Hotfix - time-sensitive changes • Bug - fixing bugs • Test - non-ticketed items/catch-all

e.g. bugfix/popup-not-showing feature/jobs-ui

This naming convention is commonly used in open-source projects. Organizing branches by type will also make it easier for users interacting with the repo to get a quick view of what the contributions are focused on i.e. are we working more on bugfixes than feature development? We can also consider using existing tools that leverage this organizational structure to generate more complete reports/metrics i.e. how many new features did we develop this quarter?

Proposed Branch Protection:

The develop, test, and main/master branches require at least one reviewer to approve changes (unless the repo maintainer?). Merging from develop to test and main/master will generally be the responsibility of the repo maintainer.

anilnatha commented 1 year ago

@marjo-luc I read your post, which is great btw, and wanted to mention something about the branch naming where it was stated:

Test - non-ticketed items/catch-all

If we can strive to create tickets for all our work, to provide traceability and ensure we have a historical record of work, should we have a catch all category at all. Our work should fit into one of the named categories. 🤷

marjo-luc commented 1 year ago

Thanks, @anilnatha. I agree: all work should be ticketed. If there is no use case for this test category, I'm fine omitting it.

grallewellyn commented 1 year ago

This was in slack but documenting here:

I asked:

Did we decide we want to be consistent with naming main or master? (ie all repos have a “main” branch or all repos have a “master” branch)
Should we mention what you all mentioned before that it is the responsibility of the developer to delete their branch once the PR has been approved? (or is that fairly standard and not worth mentioning)

Marjorie replied:

Main is the standard, but some of our older repos use master. When we looked at the MAAP API repo we decided not to bother changing it from master to main, so I think we can take this on a per repo basis. But for new repos, we should probably stick to main.
Yes, it’s always good to be as clear as we can be. I’ll add that in. Thanks!

marjo-luc commented 1 year ago

Closing this out as we have concurrence. Actionable items to implement the described workflow are outlined in this ticket.

sujen1412 commented 1 year ago

The approach we are moving forward with is: Two branches - main, develop.

UAT images will be built from tags on the develop branch vMajor.minor.patch-uat
Once tested they can be merged into main and then tagged vM.m.p
CI in the UAT and OPS env will trigger on tag push and download the image from the registry of preceding env re-tag and push in the current env. Eg. UAT will pull from DIT and OPS will pull from UAT.

MAAP-Project / Community

Proposal: Refactoring how we build workspaces for each environment. #308