KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

UI/UX #32

Closed evanroyrees closed 2 years ago

evanroyrees commented 4 years ago

1.1: General Layout

References:

1.2: Results and Progress

1.3: Bin Exploration

1.4: Bin Summary

Edits:

  1. separated 2d / 3d plot into two aims
  2. Separated embedded link selection (on tooltip hover?)

2020-05-10:

2020-05-13:

evanroyrees commented 4 years ago

Reasoning to move away from Dash framework:

I've noticed there is significant optimization that we can implement if we do not have to perform complete figure updates when exploring a figure (i.e. scatter plot of contigs). The optimal solution (for now it seems) is to relayout just the changed aspects of the figure. Unfortunately, this relayout functionality does not appear to be available in Dash, although this is used (I believe in react.js that Dash, I believe, has wrapped).

evanroyrees commented 4 years ago

See also #31. Not sure how much overlap each have. We may be able to consolidate these issues.

jason-c-kwan commented 4 years ago

I'm really sad that 3D plot has been crossed out. Even if we can't select from it, then just being able to rotate and look where things are on the two nucleotide dimensions and coverage is extremely useful. What I generally use it for is separating two bins that have different coverages but exactly overlap in 2D nucleotide space.

chanana commented 4 years ago

I've decided to put 3d plots on a separate tab. I agree, it's not an unreasonable ask but will likely require WebGL to implement for more than 10,000 data points based on my naive research.

chanana commented 4 years ago

See also #31. Not sure how much overlap each have. We may be able to consolidate these issues.

@WiscEvan I think we need to trim down the wants/needs of the UX to make it (at least seem) more tractable. Any ideas on classification of the checklist? For example, could we sort them into MVP vs nice-to-have? And then further based on priority? Or is there another grouping you can suggest?

jason-c-kwan commented 4 years ago

@chanana I think if you have enough information to sort the tasks according to how difficult they are and how much time they will take, then that will help us prioritize. I for one have no idea what is hard and what is easy in this part (having never done graphical interfaces).

chanana commented 4 years ago

Input Needed for MVP

I've copied the original comment here and annotated it with symbols indicating what ✅ has been done ⭕ needs to be done ❌ won't be done ❓ can't be done since I need an explanation of what it means.

@kaw97, @Sidduppal, @WiscEvan, and @jason-c-kwan - I need you to edit this comment in the following way. For each aim, decide if it's required to make the minimum viable product (website) and change the symbol appropriately. If there is a reason for the change such as I don't think this will help with world peace, then put that reason under the aim in code format. For the aims marked with a ❓, I'd like more information on what it means in order to implement it. Again, add an explanation using code format below the aim or make a separate comment if it requires more than two lines of explanation.

For example:

❌ make radii toggle Doesn't cure cancer ❓ Legend toggle Toggle the legend on/off. Were you really unable to grasp the concept of a toggle?

Aims

General

❌ Server status (up, down, etc?) ❌ Jobs in progress,completed ❌ "See example" button with example results from paper ❌ Logo? ❌ Dashboard style sidebar

Results and Progress

❓ Autometa version You could probably retrieve this from GH releases. This corresponds to those via semantic versioning ❓ parameters This is corresponding to Autometa input parameters and is only relevant when we have a job submission system in place ❌ Stage/Total runtime ❌ Visualize annotation stages (possibly DAG structure?) ❌ Visualize clustering progression (Generate traces at each stage, iterative slices along an alluvial plot?) ❓ visualize DBSCAN output I think the idea here was to visualize each clustering stage ❓ visualize ML refinement visualize each clustering stage similar to DBSCAN ❓ visualize paired-end refinement These were all in the dropdown where you could color the clusters by the clustering algorithm that was performed. Safe to leave this out for now

The above three things are just different columns in the table, so would basically involve swapping the color (or axis) to look at a different column @WiscEvan Does this still apply to the output? It was the case for Autometa 1.0.

Bin Exploration

✅ Interactive 2D scatter plot ✅ Colors according to a category ✅ Color category can be selected by user from a drop down. ✅ Show tooltips on hover ✅ Toggle tooltips ✅ link radii of circles to 3rd dimension ✅ make radii toggle ⭕ Interactive 3D-scatterplot At least be able to plot on embedded x/y as well as coverage z axis. Minimal interactivity would be to rotate and zoom, as well as change coloring of points (i.e. via the binning columns or phylum etc.) Selection would be a far stretch goal since it sounds hard ❌ Linked selections in both kinds of plots (on hover) Yeah this is probably hard but if you can change colors of points in the 3D plot in the above task then it should be possible ❌ 2D/3D-scatterplot toggle on different tabs Having both side by side or above/below would be nice. I don't think you need to toggle between ⭕ Link selections to update selected bins and bin metrics according to bin selections This is one of the main parts that makes this tool so useful. This uses marker gene information to compute cluster completeness and purity interactively ⭕ Contig cluster updating from linked selections (perhaps making a new column in the table) From the previous point, we would like to assign a new column after we have a selection that improves the bin quality. ⭕ Updated binning results table download ✅ Ability to change axes for 2D-scatterplot ❌ Ability to change axes for 3D-scatterplot Yeah it's fine not to have this ❓ Ability to change marker set This is a future goal after some of the later milestones have been accomplished Yeah I think we need to think about how to implement this carefully before we ask Shaurya to visualize it - this is much more complicated when each taxa has its own marker set but we also allow the user to do their own bin polishing. ❌ Resubmit updated results button I'm actually not sure what this is about ❌ Gene set selection for completeness calculation [need gene set for implementation] ❓ Alphabetically sort legend I think this is in regard to taxa? Yeah I think whenever we have a legend showing it is nice for the user to have it alphabetical ❓ Legend toggle Whether to display the annotations corresponding to the color overlay I imagine this would be a sort of hide/show thing for the color legend? ❓ Contigs (2D/3D) image trace, autometa_run.html on export Generates a static report after autometa completion Yeah I don't know what this is ❓ Selecting cluster column dropdown for any column in table Basically be aware that 'cluster' could be on an arbitrary named column, don't assume they will have specific names ❌ Circle clusters (convex hull trace?) ❌ Silhouette score (This may be more for the über curious who wish to look at clustering behavior) ❓ Multi-selection binning and append column to save selection of manually selected bins This is in relation to Cluster updating above <**Updated binning results table download**> perhaps all the cluster editing that can be done would alter some data structure in memory and then a "save" button would output a table with a new column? i.e. I don't think this should ultimately be that complicated ⭕ Alluvial plot of taxon assignments from selected contigs

Bin Summary

Also remember that Evan's tool dynamically calculates completeness/purity for selections, which is really useful ❓ Completeness/Purity ❓ Bin assigned taxon ❓ Alluvial plot of taxon breakdown for bin ❓ Assembly stats ❓ On standby/idle rotate bin selection for alluvial plot ❓ number of sequences classified (number and percentage of total) ❓ number of sequences uncluttered (number and percentage of total)

I think most of the sections in Bin Summary could also be generated as a static report when Autometa finishes. These could be located below the exploratory tool. Perhaps better would be to place these in their own view from Django. Yes this stuff is/was generated with the separate script cluster_process.py in Autometa 1.0. I think that is what is being referred to here.

jason-c-kwan commented 4 years ago

Sorry I didn't finish editing the comment. I've got to supervise my kid.

jason-c-kwan commented 4 years ago

OK, I'm done now.

evanroyrees commented 4 years ago

❓ visualize DBSCAN output I think the idea here was to visualize each clustering stage ❓ visualize ML refinement visualize each clustering stage similar to DBSCAN ❓ visualize paired-end refinement These were all in the dropdown where you could color the clusters by the clustering algorithm that was performed. Safe to leave this out for now The above three things are just different columns in the table, so would basically involve swapping the color (or axis) to look at a different column @WiscEvan Does this still apply to the output? It was the case for Autometa 1.0.

There is a discussion on the output format in issue-#80. For now, we will be appending columns in the same fashion as v1

evanroyrees commented 4 years ago

I think most of the sections in Bin Summary could also be generated as a static report when Autometa finishes. These could be located below the exploratory tool. Perhaps better would be to place these in their own view from Django. Yes this stuff is/was generated with the separate script cluster_process.py in Autometa 1.0. I think that is what is being referred to here.

Yes, cluster_process.py will need to be updated as well or a different script will need to be written to generate this summary/report.

evanroyrees commented 4 years ago

Thought this may a nice progress visualization when we have the back-end connected to the front-end. Similar to GNPS

autometa dag

The commands to generate this are in https://github.com/KwanLab/Autometa/issues/71#issuecomment-665700874

evanroyrees commented 4 years ago

I think most of the sections in Bin Summary could also be generated as a static report when Autometa finishes. These could be located below the exploratory tool. Perhaps better would be to place these in their own view from Django. Yes this stuff is/was generated with the separate script cluster_process.py in Autometa 1.0. I think that is what is being referred to here.

Yes, cluster_process.py will need to be updated as well or a different script will need to be written to generate this summary/report.

This is now being accounted for in #99

evanroyrees commented 4 years ago

uBin interface

uBin interface

evanroyrees commented 4 years ago

Interface example of tool: ICoVeR

iCover image

evanroyrees commented 4 years ago

Nice interactive tool for exploration of embeddings by tensorflow. The tool is called projector - link

chanana commented 4 years ago

Nice interactive tool for exploration of embeddings by tensorflow. The tool is called projector - link

Thanks. Am aware. It uses typescript/webgl. Tried to re-purpose it for our use, failed successfully.

evanroyrees commented 2 years ago

Discussion has been moved to https://github.com/WiscEvan/Automappa/discussions/20