mairin commented 6 years ago

Some notes from our meeting a couple of weeks ago that need some fleshing out:

Notes

User groups

Clinicians work at a feed level.

Researchers work at a project level. Are projects personal or not?

Pipeline/tree (What to call this?)

A feed can be a tree, not just a linear pipeline. The feed starts with data, and all subsequent operations are applied to the data. The feed can be serial, or it can be branched.

There is value in putting together a pipeline/tree feed, running it against an image, then taking that ordered tree of plugins and rerunning on a different data set. Pipelines may just consist of a single step, but they could be valuable to be saved and shared and rerun to share the config parameters with other researchers / for other data sets / etc.

Project sharing

This is a long-term idea. The current version of ChRIS is permissive but productive. PEople don't seem to do complex pipeline analysis, we suspect researchers are happy with workflows that are just one operation.

User Interface Objects

These are the main object types we want to think about long-term for the ChRIS UI. They may or may not necessarily each have a user-friendly name below:

Plugin

A containerized set of image processing software, akin to say a filter in Photoshop or the like, that can be applied to data to manipulate it. Plugins can be chained together to produce a tree / pipeline that outputs manipulated versions of the data.

Node

A specific point within the feed pipeline / tree. Basically a specific plugin at a specific place within the feed tree.

Feed

A feed is a specific run of a set of plugins in a tree-based pipeline. This may or may not be the best term to use for user-facing front ends. (But maybe it's fine.)

Data

Data is the actual data that the feeds operate on / manipulate. (Do we need to go more granular on this?)

Plugin pipeline

We need a better name for this object. The plugin pipeline is one or more plugins chained together plus a set of config for those plugins. (E.g., plugin A with X = .5 and Y = .99 + plugin B with X = .001 Y = .99 and Z = .4 could be an example pipeline.)

Unsure if the specific computing power configuration (e.g. how many / which processing resources in the cloud etc. are going to run the job) is part of the config here, but I suspect not.

Project

(Doesn't exist right now) A composition of pipelines + data and maybe ACLs and metadata (notes, chats, labels, etc.) for collaboration.

Other concepts

ChRIS store

An app that helps manage the plugins that may be made available to ChRIS users to include in their feeds. The cube admin can pull plugins from the ChRIS store to register them to his ChRIS deployment and make them available for users.

(Will there be just one ChRIS store or can there be multiple?)

Feed plugins

These are ChRIS plugins of a specific type that create a feed. The expect only one positional parameter: where data is out put. They can do processing, but have no input data. The PACS plugin is a feed plugin. You always have to start a feed with a feed plugin.

Data plugin

These are ChRIS plugins that need two positional parameters - the plugin node that comes before it, and the plugin node that follows after it. DS == data synthesis == data synthesis plugin == data plugin

A think a TODO for this ticket would be:

[ ] fact-check the above
[ ] map some of these concepts out visually
[ ] brainstorm on some better user-facing terms for those that are maybe too confusing to expose to users. E.g., 'plugin pipeline' needs a better name.

Fleshing some of this out a bit better I believe is going to help make the UI more intuitive because some of these concepts / objects are a bit muddy right now I think.

mairin commented 6 years ago

Cross-Organization Collaboration

Thinking through the rough model above, some considerations especially in focusing on cross-organizational collaboration using ChRIS:

Plugins should probably have specific / particular namespaces. We could end up in an ecosystem where there are multiple upstreams, multiple plugin stores, multiple deployments / site, and versions of plugins. An example - Flatpaks follow a specific naming scheme (http://docs.flatpak.org/en/latest/conventions.html), e.g., "org.gnome.Platform/x86_64/3.24" "org.gnome.Platform/x86_64/3.24"
**Pipelines / configs should also probably have namespaces and dependency metadata*** what plugins do they requireme / depend on, who created them / where did they come from, what instance are they (what was the origin instance), what site are they deployed on (what site did they come from?)
If a ChRIS admin doesn't have a required plugin installed that an end user needs, how is that handled? (A request system for ChRIS users to request them?)
Data sharing across PACs?
Federation across ChRIS instances?
Are there any intra-site collaboration considerations we need to make?
- From Ellen's talk with Jim Whitehurst at the Summit: likely. She mentioned bringing together different teams to evaluate a case - e.g. MR, genetic tests, vital monitors, optical scans, electronic health records.
- If intra-org collab across depts / disciplines is important, and we intend at some point to provide all or some of these diverse types of data (eg output of vitals monitoring, genetic test results, etc) we should be designing for diverse types of data now.
- Just a guess, but it seems likely clinicians want to focus cross-discipline on a single case, whereas while researchers are more focused across cases looking for patterns and may (??) be anchored in one or a few disciplines / data types rather than across all (???!)

Other Q's

Does one plugin have one set of configs or can a plugin have multiple configs that work with it? Only one at a time or can they be fragments that get run together?

mairin commented 6 years ago

Templates vs. Instances

I think one confusion caused by the current terms, is whether or not something is a definition / template / concept / (playbook?) vs. an instance / specific run / play / etc.

I suspect that the UI should handle these using different terms / in different areas.

A straightforward way to actualized this would be to use a standard gallery UX pattern for displaying the templates, in a separate bucket from another gallery displaying actual instances / runs.

Data vs Analysis

Another split / clear delineation we should reinforce through the ChRIS UI is data vs analysis. This is completely similar to how we split out the "Library" tab for retrieving images from the PACS vs having the "Viewer" tab for doing the analysis.

mairin commented 6 years ago

this whiteboard sucks but a bit of a map of my thinking at the moment

20180514_154247 20180514_154145

jbernal0019 commented 6 years ago

Hey @mairin Just wanted to make a quick comment on the pipeline concept.

At first I thought a pipeline definition was strictly an "ordered linear collection of chained plugin instances, each of them run with specific parameters instances". But according to this:

https://en.wikipedia.org/wiki/Pipeline_(software)

It seems that a "one-directional tree of chained plugin instances" like the one in a feed can also be considered a pipeline. So I think we just need to define the mechanism to create and store "tree-based pipeline prototypes" that can be reused (rerun) over and over by users. Each time a pipeline prototype is run it will generate a new feed with the actual data generated by the instantiated pipeline prototype. Note that a linear pipeline is just a special case of a tree-based pipeline.

I've been focus on thinking about the necessary infrastructure and technology needed by the ChRIS platform but I think that it is in this area where ChRIS can really contribute new meaningful innovation.

For instance I think that we could support pipelines created with the Common Workflow Language

https://www.commonwl.org/

in addition to our own way of defining pipelines. And we can always extrapolate concepts from the CWL.

mairin commented 6 years ago

@jbernal0019 ah ok, cool, so pipelines dont have to be linear. that helps simplify things in terms of the ui taxonomy im trying to hash out here! i was worried a tree based one would require a more specialized term.

CWL sounds like a good idea! I havent heard of it before so Im reading up on it now.

Here is the metadata-specific stuff for CWL, it seems to touch on namespacing too: https://www.commonwl.org/user_guide/17-metadata/

Also the requirements / dependencies section - https://www.commonwl.org/user_guide/20-software-requirements/

NicolasRannou commented 6 years ago

Hi Mo that is really good, I think you are getting a very good sense of what we are aiming for! I need to re-read it a couple more times and will send more questions/comments!

danmcp commented 6 years ago

@mairin Great doc. I have other questions now:

How do I know if a feed is done? This seems especially relevant when you have multiple people sharing a pipeline.
Is a plugin pipeline predefined or a dynamic series of steps/choices?

NicolasRannou commented 6 years ago

@danmcp I think a feed is done when all the plugins of the pipeline have completed - multiple people can share a pipeline but a pipeline is just a "template". When a user adds "data" into the "pipeline" creates a feed, specific to the user.

Not exactly sure what you mean by dynamic. I think the pipeline is predefined, but not necessarily linear. After pluginA complete, it will run pluginB. If pluginB generate data X, then run plugin C. Otherwise run plugin D. etc. @rudolphpienaar may have a better idea!

FNNDSC / cube-design

ChRIS UI: High-level conceptual design for ChRIS UI #7

Notes

User groups

Pipeline/tree (What to call this?)

Project sharing

User Interface Objects

Plugin

Node

Feed

Data

Plugin pipeline

Project

Other concepts

ChRIS store

Feed plugins

Data plugin

Cross-Organization Collaboration

Other Q's

Templates vs. Instances

Data vs Analysis