Most papers on single-cell data analysis use more sophisticated gene selection methods than what scOrange has to offer. From what I can tell, there are basically two different intuitions for gene selection:
Genes exhibiting high variance w.r.t. their mean expression. Seurat and scanpy both offer highly variable genes (HVG) and
dropout-based gene selection methods, genes containing more zero values than expected based on their mean expression. The original implementation can be found here and my slightly prettified implementation here.This is also the method that I've found the most success with. Original paper M3Drop: dropout-based feature selection for scRNASeq and an improved version here.
Both of these methods could be incorporated into a visual interface because the genes are selected based on a curve fitted to a graph.
The typical graph output for HVG is
and
for dropout based feature selection.
The left HVG graph has a clear line that could be moved up and down accordingly, and the dropout method has a curve that could be moved left and right with the mouse. Additionally, both methods have the option of selecting a specified N number of genes. HVG can also select a recommended number of genes, dropout ones probably don't have this.
It might also be cool to have an input signal with marker genes, so we can check whether or not they were selected:
Most papers on single-cell data analysis use more sophisticated gene selection methods than what scOrange has to offer. From what I can tell, there are basically two different intuitions for gene selection:
Both of these methods could be incorporated into a visual interface because the genes are selected based on a curve fitted to a graph.
The typical graph output for HVG is
and for dropout based feature selection.
The left HVG graph has a clear line that could be moved up and down accordingly, and the dropout method has a curve that could be moved left and right with the mouse. Additionally, both methods have the option of selecting a specified N number of genes. HVG can also select a recommended number of genes, dropout ones probably don't have this.
It might also be cool to have an input signal with marker genes, so we can check whether or not they were selected: