biolab / orange3-single-cell

🍊🔬 Orange add-on for gene expression of single cell data
https://singlecell.biolab.si/
Other
17 stars 25 forks source link

[[SUGGESTION]]: Workflows and apply "megawidget" #221

Open covingto opened 6 years ago

covingto commented 6 years ago

One issue with Orange and any kind of interactive data analysis is that during data exploration one will likely generate many kinds of analyses and this requires many widgets to be on the canvas each related to different sets of sub-analyses or ideas.

The first suggestion is to make sub-canvas widgets like one might see on tensorboard. From the main canvas these would look just like normal widgets with inputs and outputs. Double-clicking on them would open another canvas that behaves just like the main canvas except that it would have some location for the input and output channels that go into and out of the subcanvas.

Suppose for example that I had a workflow that merged datasets followed by a complicated gene selection and filtering workflow followed by generating plots that show the different genes expressed in the cell populations. This workflow could be organized so that the data merging steps were on one sub-canvas widget and that emits to the gene selection and filtering subcanvas widget that would then emit back to the main canvas so that I can do the interesting part of differential expression.

The second suggestion is to take the same concept of the subcanvas and wrap that into an apply megawidget. Remember that the subcanvas would have nodes for input and output. Suppose that the task is to get a list of differentially expressed genes by a treatment (a metadata column in my data) across clusters of cells defined by the clustering widget. I would connect that to the apply megawidget. The apply megawidget would contain a canvas just like the subcanvas described before, an interface for how to apply, and an interface for how to aggregate. There would be a subcanvas for a "template". In the subcanvas I would set up my workflow which would act on a data table as expected and eventually generate a list of differentially expressed genes and that list or set of tables would be passed to the aggregation node. Because of the task I would select the metadata column corresponding to the cluster ID for the apply interface part, and I would indicate in the aggregation part that I want the ID of the selection to be preserved in the resulting table at the end. On running, the apply megawidget would use the template that I made to generate N subcanvass (visible as a tab on the widget) each corresponding to a level from the cluster column. Each subcanvas would be individually accessible in case I needed to adjust settings in the widgets or look at something more closely within them. The apply widget would then take the tables generated by the workflow and merge them into the aggregated table with the cluster column.

You can imagine this being further nested in case I had other columns of data to subset on.

With these tools you can make very complex workflows more organized and if you can do something for one cohort or group you can do it for any group without the need to make more widgets just for the case that someone has a new grouping scheme.

Asuranceturix commented 5 years ago

Yes, please. This is my main complaint about Orange too, we want to be able to re-utilise parts of workflows.