biolab / orange3-single-cell

🍊🔬 Orange add-on for gene expression of single cell data
https://singlecell.biolab.si/
Other
17 stars 25 forks source link

Gene selection methods #348

Closed pavlin-policar closed 4 years ago

pavlin-policar commented 5 years ago

Most papers on single-cell data analysis use more sophisticated gene selection methods than what scOrange has to offer. From what I can tell, there are basically two different intuitions for gene selection:

  1. Genes exhibiting high variance w.r.t. their mean expression. Seurat and scanpy both offer highly variable genes (HVG) and
  2. dropout-based gene selection methods, genes containing more zero values than expected based on their mean expression. The original implementation can be found here and my slightly prettified implementation here.This is also the method that I've found the most success with. Original paper M3Drop: dropout-based feature selection for scRNASeq and an improved version here.

Both of these methods could be incorporated into a visual interface because the genes are selected based on a curve fitted to a graph.

The typical graph output for HVG is image

and image for dropout based feature selection.

The left HVG graph has a clear line that could be moved up and down accordingly, and the dropout method has a curve that could be moved left and right with the mouse. Additionally, both methods have the option of selecting a specified N number of genes. HVG can also select a recommended number of genes, dropout ones probably don't have this.

It might also be cool to have an input signal with marker genes, so we can check whether or not they were selected:

image