Closed ambrosejcarr closed 4 years ago
This issue is indebted to work done by @sidneybell, @liaprins and @bkmartin, as well as user feedback from @angela and @bruceAranow
Presently, in cellxgene, scientists have few affordances for managing genes. The application, to date, has been primarily focused on exploring cells. This has also determined the application’s internal data model.
This feature represents, at a high level, the addition of what we are calling ‘gene sets’ to the right sidebar. Gene sets are a list of lists — each list of genes can have a user generated name, and contain a list of genes.
Users must be able to act on both gene sets — adding a set, deleting a set, coloring by the set as a whole, duplicating sets, and individual genes within sets — adding, removing and reordering genes.
Users can presently add genes to the right sidebar, but they are not persisted — on hard reset of the browser, the genes disappear and must be re-added. This solution implies a persistent workspace for genes, which are saved locally as CSV, or to some database solution in the cloud.
Gene sets will provide a more powerful affordance for building up a workspace, and also generate untenably long lists given the heights of the histograms in the present gene UI. Thus individual genes within a gene set must have a ‘collapsed’ state, with a mini histogram (as appears on categories when genes are colored by) or some metric which describes whether or not the gene is expressed at all by the current world of cells. Collapsed state will also include the option to color by individual genes.
Users will have CSVs of genes that they work with, or could generate them from notebooks. Cellxgene should support import of some kind to ease this path in a hosted environment. Locally, the CSV can be edited directly from both cellxgene and loaded up / referenced in a notebook, so this is less of a problem.
Proposed staging
Users are notified when gene(s) do not import, including suggestions for common import failures
Users can plot a gene set by x or y.
Users can brush over the gene set to select a range of expression values
Users can expand a gene set to see and interact with its members
Users can expand a gene, see large histogram with axes, and brush to select range
Users can access gene sets they created in previous sessions (gene sets persist)
REQ: Gene lists are non-redundant, non-mutually exclusive, and contain only features from var.
REQ: Gene lists can have variable length.
REQ: List names are strings that follow the same conventions as category names.
Use cases.
Publication use case
Workspace use case
What does the "create new gene set" flow look like?
I think you mentioned before that the + and ... might be hidden until mouseover -- I think I prefer that; this looks a bit busy relative to the left sidebar.
Comment from Jonah:
My curiosity is about how/if we can precompute some standard gene sets (whether they be cells or pathways) that can be quickly referenced or imported. For example, many people want to see where myc signaling is active or Wnt ligands target genes etc. Heard him say that sharing is a future feature and this admittedly falls in the intermediate area.
Great demo during sprint review @colinmegill! A few questions for you or @ambrosejcarr:
If a user wants to add a single gene (eg the current workflow in the right sidebar), is the expected workflow now to create a gene set with one gene in it?
You mentioned a user can reorder genes within a gene set, can the user reorder the genesets? This might be more of a heatmap use case (thinking back to our convo with the Krasnow lab yesterday) and something we would tackle then.
If we expect ~100 lists of ~100 genes each, I think a search function for genesets and genes would be helpful, what do you think?
Downloading a gene set seems like a less common use case but I could imagine it (ie sharing a geneset with collaborators, importing into other tools, etc.). Has this come up in discussions at all?
Great questions @signechambers1. For this one:
- Downloading a gene set seems like a less common use case but I could imagine it (ie sharing a geneset with collaborators, importing into other tools, etc.). Has this come up in discussions at all?
Your intuition is good. The explanation for why it's missing is procedural. I only requested Colin mock up the first two use cases so we don't get too far ahead of the implementation team. If you check the second comment in this issue, Import & export are the third, followed by linking to differential expression, and finally on-platform sharing.
@colinmegill I believe this issue closes #1538, #1539, #1541, #571, and #852. Do you agree?
I think #923 relates to both the question Signe and I asked about what happens to "add genes" and how does a user creates a new gene set, and can be closed when those questions are answered. I know you cited that the work was optional for the publication use case, but it probably needs to be implemented for the workspace use case, right?
How do you propose to address multi-brushing of gene sets (#1584)? Track which genes have been brushed and disable re-brushing with some kind of visual cue to let the user know? I think the publication use case will need to address this issue.
Do you intend for users to be able to re-order genes within gene sets, and gene sets in the workspace use case in one of these use cases or reserve it for later? Where should I put that requirement? When that requirement is accounted for, I believe #1069 can be closed.
@signechambers1 great questions!
Yes, that's correct. All genes have to be in a gene set to make the sidebar skim-able and collapse-able. Feedback from users was that we should optimize for many large sets, which does slightly de-optimize for 'quick look', though it's not much slower.
Hadn't considered re-ordering genesets! I assume that'd be useful. I expect users will want to break them into sections and give them headings, as well. In the case of Tabula Muris and the mocks above:
It would mean a heading and sets something like:
Fat immune_nk immune_b mesenchymal_progenitor
Heat and Aorta ...
atrpd
and get sets with Aorta
or APOD
) that would reduce the gene sets that are visible — for an example, see Spotify's search within playlist, though their filter is not as forgiving to errors as I'd like for this given how complicated the names are. Create new gene set
button but above the first set, Spotify also a nice model for thatYes, scientists want both ingress and egress via csv drop in and see set / csv download.
A note: I don't think the data structure should encode the display, so I would still propose we persist gene sets as lists of lists even if we have a heading. The client side could sort them by heading or alphabetically by looking at a heading
attribute on the geneset, but I would rather that attribute exist on the geneset.
@ambrosejcarr re: what issues are closed, yes, all of those, except #1541 and #1584, which will need to be addressed separately.
Creating a gene set:
Adding a gene will also occur in a modal, triggered by the plus button on a gene set:
It will need to encompass adding genes singularly and in bulk, and nice error handling when adding 100's of genes with multiple types of errors.
Closing out design portion from Sprint 1, Colin has linked final figma design above.
Decisions made: The following features that are out of scope for initial staging:
Open product questions to answer in a future sprint:
Appetite: 10
Requirements:
Existing issues:
Design work: