cunningham-lab / neurocaas

IaC codebase for the NeuroCAAS Platform

http://www.neurocaas.org

GNU General Public License v3.0

34 stars 26 forks source link

Developer Interface Workflow #21

Closed cellistigs closed 3 years ago

cellistigs commented 3 years ago

One of the main bottlenecks to developer workflow at the moment is bash scripting in the remote ec2 instance- I will add a set of scripts to automate this process from a template (given a desired set of inputs and outputs, automatically write the script to transfer data and otherwise set up the local environment.)

cellistigs commented 3 years ago

Setup is done- working on https://github.com/cunningham-lab/neurocaas/issues/22 to determine next steps for this.

cellistigs commented 3 years ago

22 is done. Once the dust has settled, it looks like the best way forwards is to set up a template that describes a variable name for data that we want to analyze, and a location that it should be output to in io-dir, like so:

{ "main_items":[ "data":"inputs/videos/", "config":"configs/", ] "supp_items":[ "index":"configs/" ] }

This will assign variable names that should be easy to manipulate and use in scripts for all data that we work with subsequently: I.e. now if I want to reference the data that I just fetched from s3 in a script, I should just call: /bin/bash ~/run.sh "$data" "$config" "$index" and I should be able to find the relevant data there.

There are some details to work out here re: how we reference these data in tests, how we reference them in a way that makes sense (should data and config be preset?) But the general idea is here.

cellistigs commented 3 years ago

It also occurs to me that Automatic Scripting is just a software level portion of the blueprint. This leads to a workflow where developers can successively specify portions of their blueprint: first the docker container where their analyses are run, then their inputs and their organization, then the hardware it will be run on, and finally they will be ready to submit it.

cellistigs commented 3 years ago

The main content of this can be type and parameter checking.

cellistigs commented 3 years ago

The software level portion of the blueprint should be the thing that provides continuity for developers from being in the docker container to moving out of it. This workflow of moving out of the docker container should be the thing you design next.

Start inside docker container, then save image to a blueprint. Test parameters can be saved to the blueprint too for clarity. These steps can integrate with the neurocaas_contrib/local LocalEnv api.

Once test parameters pass on the local machine, you can run the same thing on a remote instance. These steps can integrate with the neurocaas_contrib/local RemoteEnv api (some more work to do on #31) before this can happen.

cellistigs commented 3 years ago

Saving image to blueprint is done. Next is test parameters and localenv integration.

cellistigs commented 3 years ago

Cli buildout is proceeding. Some intermediary todos to keep track of:

[x] Default behavior to launch container using setup-development-container is slightly confusing, and should be signposted more clearly.
[x] Implement history: the last n active container and image names get saved.
[x] Coordinate with localenv build.
[x] Add container + image delete methods that reference the container and image histories.
[ ] ~~Command crowding: organize as iae, remote-interactive, and remote.~~ Still fine given good documentation.

cellistigs commented 3 years ago

Incorporate test methods next (test container, run analysis); with an update to readme.

cellistigs commented 3 years ago

Underestimated the power of the infrastructure we had already. We can ease much of the scripting burden by assuming the developer will write a script that takes data and config as input. The need to manage the scripting goes away because all of the scripting is then managed locally, with assumption of local read and write. This resolves the original intention of this issue. Closing now, with the understanding that several other issues remain:

Method to tidy up old images.
Input type checking
Incorporation of test-container with arbitrary command, not just run analysis
Integration with remote workflow (#31)

Update the readme and see how important these methods seem.