Basic Querying Supported, Running Apps Supported

stevengeeky commented 6 years ago

Can query profiles, datatypes, projects, datasets, and apps
Can upload/download tasks with a new command structure
Can run apps given a set of inputs (multi-input is supported)
Can bulk download product.json

Example: Running An App

To run an app, I need 1.) the id of the app to run, 2.) a project to archive the output to and 3.) inputs with the necessary datatypes for a given app

Say I want to output a dataset with type neuro/life

First I query the project list to find my project:

./bl.js project query --admin stevengeeky

Which returns 1 result:

Id: 5afc2c8de68fc50028e90820
Name: Test Project
Admins: stevengeeky
Members: stevengeeky, soichih
Guests:
Access: Access: private
Description: test

Next I query the apps list to find an app to run that outputs type neuro/life: ./bl.js app query --output-datatype neuro/life

Which returns

Id: 5ac013f8029f78002be2c486
Name: LiFE
Type: (track: neuro/track, dtiinit: neuro/dtiinit) -> (fe: neuro/life)
Description: LiFE (Linear Fasicle Evaluation) predicts the measured diffusion signal using the orientation of the fascicles present in a connectome. LiFE uses the difference between the measured and predicted diffusion signals to measure prediction error. The connectome model prediction error is used to compute two metrics to evaluate the evidence supporting properties of the connectome.

(Returned 1 result)

So this app takes in two inputs, one with type neuro/track, and another with neuro/dtiinit. I make two queries to find the datasets that I want: ./bl.js dataset query --datatype neuro/track and ./bl.js dataset query --datatype neuro/dtiinit

$ ./bl.js dataset query --datatype neuro/track

...

Id: 5ac2be0b029f78002be2c49f
Project: Test Public Project
Admins: soichih
Members: soichih
Guests:
Subject: 28853
Session: 1
Datatype: neuro/track<sd_prob>
Description: sd_prob from MRtrix Tracking
Create Date: 4/2/2018, 7:34:35 PM (2 months ago)
Storage: jetstream
Status: stored

$ ./bl.js dataset query --datatype neuro/dtiinit

...

Id: 5ac2bdec029f78002be2c49a
Project: Test Public Project
Admins: soichih
Members: soichih
Guests:
Subject: 28853
Session: 1
Datatype: neuro/dtiinit
Description: output from dtiInit
Create Date: 4/2/2018, 7:34:04 PM (2 months ago)
Storage: jetstream
Status: stored

So now I have everything, time to run the app:

./bl.js app run --id 5ac013f8029f78002be2c486 --input 'track:5ac2be0b029f78002be2c49f' --input 'dtiinit:5ac2bdec029f78002be2c49a' --project 5afc2c8de68fc50028e90820 --config '{ "num_iterations": 500, "life_discretization": 360 }'

found app input key 'track'
found app input key 'dtiinit'
Data Staging Task Created (5b0be982973ee7002adaf731)
LiFE task for app 'LiFE' has been created.
To monitor the app as it runs, please execute
bl app wait --id 5b0be982973ee7002adaf733

As stated, you can monitor the app with the generated command:

$ ./bl.js app wait --id 5b0be982973ee7002adaf733

SERVICE: brain-life/app-life
STATUS: Service started
(running since 1 minute ago)

And the process page looks as you would expect:

process page

I also implemented an includable JavaScript API, which can be used to write scripts that can easily replicate the above process (example shown in the README. Additionally, I supplied an example of running batch processes across a list of datasets using a shell script to run a pRF application)

francopestilli commented 6 years ago

@soichih @stevengeeky where would a user get the hashes for inputs and outputs? ./bl.js app run --app 5ac013f8029f78002be2c486 --inputs '5ac2be06029f78002be2c49e, 5ac2bdec029f78002be2c49a' --project 5afc2c8de68fc50028e90820

address bar?

soichih commented 6 years ago

@francopestilli Yes, you can get it from the URL, or use the CLI query commands to query for app / datatypes, datasets, etc.. and use the ID found there.

@stevengeeky We should make sure that, when a user uploads a dataset, it would output the dataset ID in an easily parsable way so that a user can then take that ID and plug it into the app run CLI.

francopestilli commented 6 years ago

@soichih thanks. Do hashes have stems and suffixes? or are they random? (Random is my guess)

stevengeeky commented 6 years ago

It's random, just a primary key @francopestilli

francopestilli commented 6 years ago

@stevengeeky OK great, thanks. I am trying to get a sense of the workflow for the user here. It is my understanding that the users will need to hand pick a series of hashes that are a bit human-unfriendly. I think it is fine, but perhaps in the future, we might want to consider mechanisms to create a more human-friendly mapping between random primary key hashes and perhaps apps, projects data types etc. Say for example:

./bl.js app run --app 'app-life' --inputs 'dwi, track' --project 'myProject' --subject 'thisSubject'

BL should then find the hashes.

stevengeeky commented 6 years ago

Actually, you can already do that

francopestilli commented 6 years ago

nice!

We have a winner.

stevengeeky commented 6 years ago

Restructured the command line argument format to look like bl <object> <method> [arguments] (So for example, bl project query --admin stevengeeky --search 'Test Project')
Reformatted code to meet Soichi's preferences (no multiline strings, always have parentheses with else)
Moved app process monitoring to bl app monitor --id task_id which is actually just a general task monitoring service
Replaced the interactive prompt with a command line object 'prompt' for setting config parameters when submitting an app

I also want to add more query options for specialized searches

soichih commented 6 years ago

TODOs.

[login]

Give user capability to request long lasting JWT token - maybe give user number of days? This would require changes to the auth service. We should also show when the token will expire after a successful login by parsing the jwt token and pulling out exp attribute.

[project-query]

Search result seems to be matching items that shouldn't be matching like "O3D"
On README, add instruction on how to parse the Id via command line lilke. ./bl.js project query --search o3d | grep Id | cut -d" " -f2 | tail -1 5afedb19251f5200274d9ca5
On README, we should probably move project query on top of dataset query - as user would most likely want to query dataset using some project ID.
add -id for searching by id. don't search id in --search

[dataset-query]

remove chalk
Instead of "(Returned 100 results)", display "(2234 total datasets - showing first 100. To view next 100, run "bl dataset query --skip 100") Also add --limit 1000?
add -id for searching by id. don't search id in --search

[dataset-download]

Maybe show dataset detail that's being downloaded?
Maybe show download progress info using request-progress module?
Maybe validate the --datatype to make sure user is using datatype that actually exists?
Make --datatype query exactly the dataset with the datatype name. Remove regex queries.
Update the query structure to $ ./bl.js dataset query --datatype "neuro/anat/t1" --datatype_tag "!acpc_aligned"
move the downloadDataset() to bl-dataset-download.js
Make dataset ID as a defult parameter (?) so that user can do "bl dataset dowload 112312132"
Maybe allow user to select download path - instead of defaulting to the dataset ID? Like "bl dataset download 123123123121 track"

[dataset-upload]

Make --directory the default param so that user can do "bl dataset upload --project 123 --datatype neuro/anat/t1w myinputdir"
Make --desc to work as well as --description or -d
Update to accept multiple --tag (or -t) instead of --tags with comma seperated strings

[all]

Make --project and -p work. --description and -d work, so on.. .option('-p, --peppers', 'Add peppers')
For overwrite console output (like "Brain life... " thing), make sure to check for interactive TTY. For non-terminal TTY, don't show progress. (could use process.stdout.isTTY to check)
un-refactor query() and simplify

[app-query]

Maybe update to --input-datatype "neuro/anat/t1" instead of of --input-type to be consistent.
Should check datatype
add -id for searching by id. don't search id in --search

[app-run]

We need to allow mapping of app's input IDs to the dataset IDs.
config could receive both regular config parameters and input datasets IDs. For dataset IDs, we use it to stage them but update the provided config to point to the actual dataset paths.
Remove --input and just use --config for now.
Validate for valid project, valid app ID, etc..
Make sure dataset IDs provides are accessible by the user (and not removed, and status is stored)
datatype tag checking should check tags for inclusivity - so it should allow dataset with "soichi" to be used by app for input datatype that doesn't have any tags set.
check to make sure instance creation was successful. it could return "{ message: 'not member of the group you have specified' }". If you see this, ask user to re-login.
Don't need to wait for staging to fnish. You can submit both staging and app at the same time (just put deps on the app to depend on staging).

[app-monitor]

For non TTY, just sit and wait, or.. display status every N minutes specified by the user as a regular console (not overwriting log)
I think we should call it app "wait"

soichih commented 6 years ago

TODO. We should have a capability to query datasets and download all product.json.

francopestilli commented 6 years ago

@soichih @stevengeeky yes. Ideally, we would like to be able to generate an output file (say CSV) by collecting values from X,Y, and Z variables from N subjects in a project.

stevengeeky commented 6 years ago

[login]

✓ Give user capability to request long lasting JWT token - maybe give user number of days? This would require changes to the auth service. ✓ We should also show when the token will expire after a successful login by parsing the jwt token and pulling out exp attribute.

[project-query]

✓ Search result seems to be matching items that shouldn't be matching like "O3D" ✓ On README, add instruction on how to parse the Id via command line lilke. ✓ On README, we should probably move project query on top of dataset query - as user would most likely want to query dataset using some project ID. ✓ add -id for searching by id. don't search id in --search

[dataset-query]

✓ remove chalk ✓ Instead of "(Returned 100 results)", display "(2234 total datasets - showing first 100. To view next 100, run "bl dataset query --skip 100") Also add --limit 1000? ✓ add -id for searching by id. don't search id in --search

[dataset-download]

✓ Maybe show dataset detail (id) that's being downloaded? ✓ Maybe show download progress info using request-progress module? ✓ Maybe validate the --datatype to make sure user is using datatype that actually exists? ✓ Make --datatype query exactly the dataset with the datatype name. Remove regex queries. ✓ Update the query structure to $ ./bl.js dataset query --datatype "neuro/anat/t1" --datatype_tag "!acpc_aligned" ✓ move the downloadDataset() to bl-dataset-download.js ✓ Make dataset ID as a defult parameter (?) so that user can do "bl dataset dowload 112312132" ✓ Maybe allow user to select download path - instead of defaulting to the dataset ID? Like "bl dataset download 123123123121 track"

[dataset-upload]

✓ Make --directory the default param so that user can do "bl dataset upload --project 123 --datatype neuro/anat/t1w myinputdir" ✓ Make --desc to work as well as --description or -d ✓ Update to accept multiple --tag (or -t) instead of --tags with comma seperated strings

[all]

✓ Make --project and -p work. --description and -d work, so on.. .option('-p, --peppers', 'Add peppers') ✓ For overwrite console output (like "Brain life... " thing), make sure to check for interactive TTY. For non-terminal TTY, don't show progress. ✓ un-refactor query() and simplify

[app-query]

✓ Maybe update to --input-datatype "neuro/anat/t1" instead of of --input-type to be consistent. ✓ Should check datatype ✓ add -id for searching by id. don't search id in --search

[app-run]

✓ We need to allow mapping of app's input IDs to the dataset IDs. ✓ Validate for valid project, valid app ID, etc.. ✓ Make sure dataset IDs provides are accessible by the user (and not removed, and status is stored) ✓ datatype tag checking should check tags for inclusivity ✓ check to make sure instance creation was successful. it could return "{ message: 'not member of the group you have specified' }". If you see this, ask user to re-login. ✓ Don't need to wait for staging to fnish. You can submit both staging and app at the same time (just put deps on the app to depend on staging).

[app-monitor]

✓ For non TTY, just sit and wait, or.. display status every N minutes specified by the user as a regular console (not overwriting log) ✓ I think we should call it app "wait"

stevengeeky commented 6 years ago

I will also add bulk product.json downloading. But the main todolist is finished (I also made a pull request on the auth service)

stevengeeky commented 6 years ago

✓ (implemented) We should have a capability to query datasets and download all product.json

francopestilli commented 6 years ago

@stevengeeky @soichih thanks! This is looking great. I would like to discuss how we could implement an "average subject for each project"

brainlife / cli

Basic Querying Supported, Running Apps Supported #1

Example: Running An App