callumforrester commented 8 months ago

Remote Git Repository as Source for Plans and Devices

Background

We introduced a scratch area in https://github.com/DiamondLightSource/blueapi/pull/313 so that plan/device code could be loaded from external files on startup. Typically we check out these files on a shared filesystem for prototyping. We have found several issues with this approach:

Longer startup times as the code is loaded in
Blueapi is difficult to hot-reload (hopefully fixed with https://github.com/DiamondLightSource/blueapi/issues/317)
The shared files may get conflicting edits by multiple people
- Especially confusing if one person left uncomitted changes
- May also lead to permissions issues if someone creates, and owns, a new file
It can be hard to restore blueapi to a known state without deleting WIP code in the scratch area

We are therefore looking for a more flexible, long-term solution that will enable more robust deployments at the facility level.

Proposed Solution

Outline

Add the option to configure blueapi to pull down and install git repositories into its environment on startup. Define a "workspace" i.e. a group of remote git repositories that blueapi uses. On startup/reload, it will pull these as-configured (latest main, latest tag, some specific branch or hash). It can be configured via REST endpoints. It will also cache what it pulls in case the remote goes down.

blueapi-code-upload-simple(1)

Example Workflow for Prototyping Changes

An example workflow for prototyping changes to plans/devices is therefore:

Checkout the relevent repo
Make a new branch
Make some experimental changes and push
Point blueapi at the new branch

This adds some extra overhead to get experimental changes in, however they can be addressed outside of blueapi. For example, we have considered an auto-commit program for less advanced users. From their perspective they click "save" and a few seconds later the changes are live.

Benefits

We can track ongoing changes to deployments at the facility level
No shared checkouts are being edited
There is always the option of easily restoring to a working state (main branch)
Caching reduces startup times most of the time

Outstanding Concerns

Is this too complex?
It's risky to depend on remote github/gitlab repositories for runtime operation (solvable by hosting an intermediate git server, but that adds yet more complexity)
Are we just using git as a database?

stan-dot commented 7 months ago

re: 'point blueapi at the new branch' how would this be done? with a config value? there is no 'config' area in the current CLI.

re: risk an intermediate server could be on the local gitlab with bidirectional mirroring. an opinion from the cloud team would be necessary. Definitely relying on github only without redundancy would be bad.

re: Are we just using git as a database? is there something wrong with that?

re: self hosted vscode

we should spin a container there with devcontainers with the auto commit extension https://marketplace.visualstudio.com/items?itemName=vsls-contrib.gitdoc

re: making own editor from scratch

I had a brief discussion about this. Main conclusion - it's very difficult. intellisense would be a huge block and there is no spare capacity to make it a good product. there are some libraries like

https://editorjs.io/ or https://github.com/tinymce/tinymce

but those are not well-suited for code. Even still we could manage file updates - https://fastapi.tiangolo.com/tutorial/request-files/ .

stan-dot commented 7 months ago

re: self-hosted but diff than vscode

I hope each of those could be pulled up with a simple helm chart. The one for redhat doesn't seem the best as the beamline workstations have windows not redhat.

then the search for a self-hosted solution starts with awesome-selfhosted README.

there we find coder

eclipse Theia https://theia-ide.org/

gitpod https://www.gitpod.io/

code-server https://docs.linuxserver.io/images/docker-code-server/#usage https://code.visualstudio.com/docs/remote/vscode-server

and also some more niche projects for comparison, that are likely not mature enough for our use case:

stan-dot commented 7 months ago

action recommendation: outline a brief plan to get a devcontainer setup for a specific beamline - like i20-1 - define a set of extensions, and create a github repo with a gitlab mirror. Then take this to the cloud team to get the thing running and then add 'read from repo' endpoint refresh into blueapi config.

this proof of concept is not an MVP yet but every day that the scripts aren't version-controlled is a risk.

Expected outcome: in 2 weeks the proof of concept is built and then the effort and cloud resources to create an MVP can be estimated, and subsequent migration can be propagated throughout the beamlines.

Once deployed presumably this would require little maintainance. One shared repository for all the plans across beamlines could be added as a nice feature.

callumforrester commented 7 months ago

A few thoughts:

We do not want github or gitlab as an operational dependency, perhaps the internal mirror should be a more lightweight git server that we can deploy alongside blueapi?
Why don't we want to self-host vscode? Probably best if we have the same UI as the scientists for plan development
Why a devcontainer per beamline?
One shared repo of all plans is an interesting idea, would need other opinions.
What do we need the cloud team for?

A couple of other things to consider:

We want to automate git committing, which will lead to a messy git history. Do we also want to stage to a separate, tidy git history? E.g. via PRs
Do we want to track the history of each plan? If so, how do we do this if we're just committing them to a git repo, git has okay-but-not-perfect tools for tracking the history of a function, and if a plan is renamed then it may well be a case-by-case decision as to whether it is the same "plan" afterwards.

stan-dot commented 7 months ago

I mean our local gitlab instance maintained by the cloud team. that seems easier that making another mirror from scratch https://docs.gitlab.com/ee/user/project/repository/mirror/bidirectional.html

I mean for 1 beamline as an MVP, to try out there their specific plans. limiting the scope.

messy git history is better than no history.

why would we want to track history of each plan? that's a wild requirement. arguably we could cache a git ref or text snapshot in the plan metadata instead of just plan name. not sure what is the use case here

stan-dot commented 7 months ago

testing the linuxserver /code-server image now https://hub.docker.com/r/linuxserver/code-server

stan-dot commented 7 months ago

let's consider for a moment what if the experimental plans lived at the head, people just pushing to production all the time. of course the more long-lived plans would be in a different directory

stan-dot commented 7 months ago

an intermediate server could be on the local gitlab with bidirectional mirroring. an opinion from the cloud team would be necessary. Definitely relying on github only without redundancy would be bad.

with that of course to not get stuck in the event of github coing down

stan-dot commented 7 months ago

and the workspace dir could be mounted in /dls so that it's always available locally if needed

stan-dot commented 7 months ago

gitdocs seems to work fine. next I'll try to commit a new ophyd device from there

stan-dot commented 6 months ago

achieved a mirroring between 2 repos https://github.com/stan-dot/shared-scripts

https://gitlab.diamond.ac.uk/import/github/status

https://gitlab.diamond.ac.uk/xma12127/shared-scripts/-/tree/main

stan-dot commented 6 months ago

one issue seems to be about the 'writing to local filesystem' and also then for multi-user access.

perhaps the use of coder would be needed https://coder.com/docs/v2/latest/install/kubernetes

update: our current cloud config should be ok to try out coder

stan-dot commented 6 months ago

update: I have no idea how to deploy coder with helm in my namespace at the argus cluster

stan-dot commented 6 months ago

MVP idea - just use the provided and managed module load vscode with a loaded devcontainer with pre-loaded extension for autosave as well as a minimalistic vscode settings profile. Users that'd like more control could customize their profile further.

https://code.visualstudio.com/docs/editor/profiles#_python-profile-template

we could save the profiles as github gists https://code.visualstudio.com/docs/editor/profiles#_save-as-a-github-gist

we could hard-code script like this: module load vscode && code bluesky_plans --profile https://gist.github.com/diego3g/b1b189063d21b96d6144ca896755be64

the profile indicated here is the one from the github gist with the most stars: https://gist.github.com/search?l=JSON&o=desc&q=vscode+profile&s=stars

callumforrester commented 6 months ago

Where does the local copy of the code live?

stan-dot commented 6 months ago

ah I forgot to address this key question. GitDoc is a Visual Studio Code extension that allows you to automatically commit/push/pull changes on save local copy could live on the workstation scratch dir. Or somewhere else, I don't think it'd make much of a difference if the push and pull is frequent

callumforrester commented 6 months ago

From discussion: the core problem is to put git as the barrier between changes being written and read from blueapi. The target that blueapi is pointing at MUST BE beyond the local filesystem. Where the edits happen is secondary - that can be in a local vim or preconfigured vscode editor, or in the cloud with a tool like coder.

A ligtweight git server could be tested to investigate this further.

stan-dot commented 6 months ago

https://www.reddit.com/r/git/comments/seu9pe/is_there_a_way_to_consistently_mirror_one_repo_to/ https://hub.docker.com/r/alpine/git

first-pass survey of solutions for this mirror setup

stan-dot commented 6 months ago

stumbled across another solution for this https://github.com/rgrove/synchrotron

callumforrester commented 4 months ago

Continued in #509

stan-dot commented 4 months ago

@callumforrester this could still be kept here as part of the epic afaik

callumforrester commented 4 months ago

@stan-dot Good point!

DiamondLightSource / blueapi

Remote git server as source for plans and devices #363

Remote Git Repository as Source for Plans and Devices

Background

Proposed Solution

Outline

Example Workflow for Prototyping Changes

Benefits

Outstanding Concerns

re: self-hosted but diff than vscode