CONP-PCNO / conp-portal

:bar_chart: The CONP data portal
https://portal.conp.ca/
MIT License
8 stars 25 forks source link

Pipeline execution on CONP datasets #63

Open glatard opened 5 years ago

glatard commented 5 years ago

We should streamline the processing of CONP datasets with CONP pipelines, possibly by reviving https://github.com/CONP-PCNO/conp-pipeline

paiva commented 5 years ago

I agree with this suggestion. How would you like to proceed?

glatard commented 5 years ago

This has two aspects:

  1. Following a discussion with @shots47s. From the portal, the frontend should be developed to launch a specific pipeline on a specific dataset using CBRAIN's REST API. The new CBRAIN GUI will soon provide widgets to facilitate that.
  2. From the command line, this is already possible using Boutiques+DataLad. I don't think we need to add anything specific there.
shots47s commented 5 years ago

I think we should do the CBRAIN execution in two stages:

1) redirect to the portal as a first pass (i.e. send over to CBRAIN the information about which dataset and pipeline to be executed) and let the users use CBRAIN to run it. 2) Then move to providing modular UI components from our new interface and run the jobs through the CBRAIN API. It may not be necessary then to code up an actual connection to the API, because the React Components will have the baked in.

cmadjar commented 4 years ago

@glatard should this issue be closed?

cmadjar commented 4 years ago

actually, I will close it. Feel free to reopen it if you think there is still work involved on that issue.

glatard commented 4 years ago

On the CBRAIN front it would be useful to have a tighter integration than just redirecting to the login page. We should check with the CBRAIN team if point 2. in @shots47s' list above would be doable.

natacha-beck commented 4 years ago

We should discuss about the new interface in the coming weeks, I will bring the point 2 to this discussion.

cmadjar commented 4 years ago

Discussed briefly at the CONP dev call of September 30th, 2020.

Will focus and split that issue in smaller tasks at the next CONP dev call (October 7th).

@glatard should we invite people from the CBRAIN team to the next CONP dev call to discuss the plan? If so, who should be invited?

glatard commented 4 years ago

Here are a few possible actions regarding this issue, organized in four Goals summarized below. All goals can be worked on in parallel, except Goal 3 as it depends on 1 and 2.

Screenshot from 2020-10-07 08-28-29

Goal 1: Run CONP pipelines in CBRAIN

Tasks

  1. Make sure that all CONP pipelines that are available in CBRAIN appear as such in the CONP portal.
  2. When user click on the CBRAIN button in a CONP pipeline, redirect to the pipeline launch page instead of the generic CBRAIN login.

How

Point 2 most likely requires storing a CBRAIN tool config id for each pipeline, preferably as a config file also available on GitHub for easier update. This design would also solve point 1, as a pipeline will be assumed to be installed in CBRAIN if and only if it has a valid tool config id. When registering config ids, one should make sure that they match the exact same pipeline (boutiques descriptor) than registered in CONP.

Who

CONP developers (@cmadjar, @mandana-mazaheri), liaise with @natacha-beck to get tool config ids.

Goal 2: Process CONP datasets in CBRAIN

Task

How

The ideal solution would be to use CBRAIN's DataLad data provider. Otherwise, install and download the datasets on a server (suggestion: Beluga, to facilitate processing), and register this location as a regular CBRAIN data provider. Make sure that simple pipelines (Diagnostics) can be run on the files. In any case, new datasets should be created automatically (either create a new data provider or register new files to an existing data provider).

The CBRAIN data provider id should be stored using a mechanism similar to the one used to store CBRAIN tool config ids (see previous point). Suggestion: JSON file available in the portal config on GitHub.

Who

This is on the CBRAIN roadmap. Need to make sure that the CBRAIN datalad provider works as expected. Liaise with CONP developers for DataLad expertise.

Notes

Something specific has to be done for datasets that require authentication. The CBRAIN team will manually configure permissions.

Goal 3: Process CONP datasets in CBRAIN using CONP pipelines

Tasks

How

Needs discussion, it might be a bit tricky, as fine-grained file selection in the dataset might be necessary.

Who

CONP portal developers: @liamocn, @xlecours

Goal 4: Analytics on pipeline execution

Task

How

Who

@mandana-mazaheri for the provenance dashboard, liaise with @nbeck for provenance upload from CBRAIN.

cmadjar commented 4 years ago
cmadjar commented 3 years ago

ooooops, closed the wrong issue.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 5 months with no activity. Remove stale label or comment or this will be closed in 3 months.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 3 months with no activity.