UWB-Biocomputing / WorkBench

Software and data provenance management platform for simulations of dissociated cortical cultures.
https://uwb-biocomputing.github.io/WorkBench/
Apache License 2.0
1 stars 3 forks source link

Clean up Workbench directory structure #89

Closed stevecl5 closed 3 years ago

stevecl5 commented 3 years ago

The files that are created and/or used by Workbench are currently split between two locations:

Install Location

Workbench/
├── lib/
├── BaseTemplates/
│   └── BaseTemplateDefault.xml
├── BrainGridRepos/
├── ParamsClassTemplateConfig/
│   ├── ConnectionsParamsClass/
│   ├── LayoutParamsClass/
│   ├── NeuronParamsClass/
│   ├── SynapsesParamsClass/
│   ├── AllParamsClasses.xml
│   └── AllParamsClasses.xsd
├── projects/
│   ├── project1/
│   │   ├── configfiles/
│   │   │   ├── NList/
│   │   │   │   ├── act.xml
│   │   │   │   ├── inh.xml
│   │   │   │   └── prb.xml
│   │   │   └── project1.xml
│   │   ├── provenance/
│   │   │   └── project1.ttl
│   │   ├── results/
│   │   │   ├── project1-out.xml
│   │   │   └── project1_v1_simStatus.txt
│   │   ├── scripts/
│   │   │   ├── project1_script1.sh
│   │   │   ├── project1_v1_scriptStatus.txt
│   │   │   └── project1_v1_SHA1Key.txt
│   │   └── project1.xml
│   └── UniversalProvenance.ttl
├── BaseTemplateConfig.xml
├── GraphittiWorkbench.jar
├── provOverhead.txt
├── WD-log.0
├── WD-WorkbenchManager-log.0
└── user.json

Home Directory (Remote*)

~/
├── BrainGrid/
│   ├── workbenchconfigfiles/
│   │   ├── NList/
│   │   │   ├── act.xml
│   │   │   ├── inh.xml
│   │   │   └── prb.xml
│   │   └── project1.xml
│   ├── results/
│   │   └── project1-out.xml
│   └── growth
├── project1_script1.sh
├── project1_v1_output.txt
├── project1_v1_scriptStatus.txt
├── project1_v1_SHA1Key.txt
└── project1_v1_simStatus.txt

* For local simulations, all of these files are created in the install location except for scriptStatus.txt and output.txt which are created in the user's home directory.

The current directory structure (or lack thereof) creates a lot of clutter in the base of these two directories. To improve the user experience, we should come up with a better directory structure that organizes these files in a way that is convenient for the user, minimizes name collisions, and supports a variety of workflows. For consistency, the directory structure should be similar whether the user chooses to run simulations locally or on a remote machine.

stevecl5 commented 3 years ago

The current structure uses the install location to store files that are used/created by Workbench directly. The secondary storage location (which may be local or remote) is used to store files that are created or used by the bash script that runs the simulation.

One possible solution is to use the same general idea while better organizing the files within the two directories. For example:

Install Location

Workbench/
├── lib/
├── logs/
│   ├── provOverhead.txt
│   ├── WD-log.0
│   └── WD-WorkbenchManager-log.0
├── projects/
│   ├── project1/
│   │   ├── configfiles/
│   │   │   ├── NList/
│   │   │   │   ├── act.xml
│   │   │   │   ├── inh.xml
│   │   │   │   └── prb.xml
│   │   │   └── project1.xml
│   │   ├── provenance/
│   │   │   └── project1.ttl
│   │   ├── results/
│   │   │   ├── project1-out.xml
│   │   │   └── project1_v1_simStatus.txt
│   │   ├── scripts/
│   │   │   ├── project1_script1.sh
│   │   │   ├── project1_v1_scriptStatus.txt
│   │   │   └── project1_v1_SHA1Key.txt
│   │   └── project1.xml
│   └── UniversalProvenance.ttl
├── templates/
│   ├── BaseTemplates/
│   │   └── BaseTemplateDefault.xml
│   ├── ParamsClassTemplateConfig/
│   │   ├── ConnectionsParamsClass/
│   │   ├── LayoutParamsClass/
│   │   ├── NeuronParamsClass/
│   │   ├── SynapsesParamsClass/
│   │   ├── AllParamsClasses.xml
│   │   └── AllParamsClasses.xsd
│   └── BaseTemplateConfig.xml
├── GraphittiWorkbench.jar
└── user.json

Secondary Location

~/.local/workbench/
├── records/
│   └── project1_20210706_184255/
│       ├── project1_script1.sh
│       ├── project1_v1_output.txt
│       ├── project1_v1_scriptStatus.txt
│       ├── project1_v1_SHA1Key.txt
│       └── project1_v1_simStatus.txt
├── simulators/
│   └── Graphitti/
│       ├── workbenchconfigfiles/
│       │   ├── NList/
│       │   │   ├── act.xml
│       │   │   ├── inh.xml
│       │   │   └── prb.xml
│       │   └── project1.xml
│       ├── results/
│       │   └── project1-out.xml
│       └── growth
└── temp/
stevecl5 commented 3 years ago

@stiber @king-shak tagging for feedback and discussion

stiber commented 3 years ago

OK, some comments:

stevecl5 commented 3 years ago

I would like to split the "installation location" from the "user Workbench directory" (or may be "workbench project dir"), and rename the "secondary location" to be the "simulation directory" (or maybe "simulation project dir").

"project directory" and "simulation directory" make sense to me conceptually, but that may change depending on what files end up in each location.

The jar file should be in the installation location. Have we determined that there is no need for separate template files; that these are contained in the jar?

I'm still not sure if the separate template files are necessary or if we can accomplish the same thing by reading everything from the JAR resources. Depending on how difficult this is to change, I may put them in a "templates" folder for now as an intermediate step.

There should be a ~/.workbenchrc (or similar name) with basic WB startup config information, like the most recent WB dir and the sim dirs for each machine.

I discovered that "rc" stands for "run commands" and is basically a bash script that configures the Unix environment for an application. Since we are really just storing info and not running any commands, I believe that a YAML file (or similar) would make more sense. Currently, workbench saves directory information in user.json (JSON being a subset of YAML).

~/.workbench could be a directory that contains this configuration file (e.g. user.json, config.yaml). This directory could also contain the workbench logs or other program data that is "hidden" from the user, allowing the user's "project directory" to just contain workbench projects.

In the sim project dir, there needs to be a separate dir for each simulation that is done. Subdir is "simulations", not "simulators", I think.

At the time, I was thinking of this as a simulator repositories directory. However, it sounds like we want to leave the repo location totally up to the user, so this may not apply anymore.

I'm not clear what the "records" dir is, or why that would be outside the corresponding simulation dir. Seems like we don't want the WB files for a sim disconnected from the corresponding sim artifacts.

The "records" dir was supposed to be the simulation dir. I was thinking of it like a "record" in a database (though I realize this is a poor name, in hindsight). The idea was to have one "record" for each simulation that took place on that machine.

I realize we could also just mirror the <project>/<simulation>/ structure from the project directory. However, I do have some concerns about mirroring "project" information in the "simulation" directory since project information could potentially change and cause the simulation directory to become out of sync. The simulator doesn't need to know anything about the project to run a simulation, so it may be worth avoiding this altogether and just storing information for each individual simulation (using some unique identifier to avoid collisions).

Like the WB jar file, the simulator executable will be installed elsewhere. That's OK, right, since we capture its commit ID within a WB file?

The commit ID is captured with a git log command from inside the repo folder (at least this is how it works when building the executable). I'm not sure how this would work with a pre-built simulator executable, since workbench doesn't know how that executable was created (I believe there would not be a commit node). If this information was stored in the executable somehow, workbench could check the executable itself to get the commit info.

stiber commented 3 years ago

I discovered that "rc" stands for "run commands" and is basically a bash script that configures the Unix environment for an application. Since we are really just storing info and not running any commands, I believe that a YAML file (or similar) would make more sense. Currently, workbench saves directory information in user.json (JSON being a subset of YAML).

Close. It isn't necessary an sh/bash script; it's in whatever format the program in question uses. (For example, sh -- the Bourne shell -- was rapidly followed on with csh -- the C shell -- and its .cshrc file is, of course, a csh script.

~/.workbench could be a directory that contains this configuration file (e.g. user.json, config.yaml). This directory could also contain the workbench logs or other program data that is "hidden" from the user, allowing the user's "project directory" to just contain workbench projects.

I think this is fine, and builds in future expandability. Should be easy to read and modify manually for advanced folks who want to tweak things manually, and should be easy to use from the Java program side.

The commit ID is captured with a git log command from inside the repo folder (at least this is how it works when building the executable). I'm not sure how this would work with a pre-built simulator executable...

How about using git API to grab (some or all of) the simulator commit history? And/or, as we discussed at our last meeting, seeing if there is a way we can embed the commit into the executable when it's built (perhaps with a shell/Python/whatever script run at that time that uses the git command to get it, then adds it as a preprocessor define for the makefile CMake generates), so we can just run the simulator with a, say, -v option to have it output its commit (and then quit, without actually running a simulation).

If this sounds like what we'd want, I will throw its implementation over the fence to the Graphitti subgroup.

stevecl5 commented 3 years ago

Continued from issue #92.

Final directory structure:

Install Directory*

/usr/local/bin/Workbench/
├── lib/
└── BrainGridWorkbench.jar

* read-only, portable

Projects Directory

WorkbenchProjects/
├── Default/
│   └── Default.json
└── MyProject/
    ├── .artifacts/
    ├── testSim/
    │   ├── configfiles/
    │   │   ├── NList/
    │   │   │   ├── act.xml
    │   │   │   ├── inh.xml
    │   │   │   └── prb.xml
    │   │   └── testSim.xml
    │   ├── provenance/
    │   │   └── testSim.ttl
    │   └── script/
    │       ├── testSim_script.sh
    │       ├── testSim_scriptStatus.txt
    │       ├── testSim_SHA1Key.txt
    │       └── testSim_simStatus.txt
    ├── anotherSim/
    ├── MyProject.json
    └── MyProjectProvenance.ttl

Simulations Directory

WorkbenchSimulations/
├── testSim/
│   ├── configfiles/
│   │   ├── NList/
│   │   │   ├── act.xml
│   │   │   ├── inh.xml
│   │   │   └── prb.xml
│   │   └── testSim.xml
│   ├── results/
│   │   └── testSim-out.xml 
│   ├── testSim_script.sh
│   ├── testSim_cmdOutput.txt
│   ├── testSim_scriptStatus.txt
│   ├── testSim_SHA1Key.txt
│   └── testSim_simStatus.txt
└── anotherSim/

Workbench Directory

.workbench/
├── BrainGridRepo/
├── logs/
│   ├── provOverhead.txt
│   ├── WD-log.0
│   └── WD-WorkbenchManager-log.0
└── user.json