YeoLab / flotilla

Reproducible machine learning analysis of gene expression and alternative splicing data
http://yeolab.github.io/flotilla/docs
BSD 3-Clause "New" or "Revised" License
121 stars 26 forks source link

abstract out machine learning visualization, metadata-based subsetting, and splicing compute #315

Open olgabot opened 9 years ago

olgabot commented 9 years ago

As of now, flotilla is a monstrous beast of code that tries to do everything and thus does nothing well. I see three major partitions of what flotilla does, and propose to fragment these into separate, well-defined packages.

Machine learning visualization

By this, I mean mostly the NMF and PCA visualizations. The ability to calculate and plot PCA in python is very lacking, in comparison to R.

I'm not a fan of the exact R syntax, but the fact that you don't need to iterate over the different axes is huge. I'd like to model it after seaborn's Facet Grids http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html

Metadata-based subsetting/IPython widgets

Another major feature of flotilla that is often touted is the ability to use IPython's javascript widgets to look at different subsets of the data. I think this can be related to the ML visualization eventually but I'm not sure at this point

Splicing computation

By this, I mean the splicing-specific modality calculations and NMF space reductions, which are only applied to the splicing data. I want these to be a separate package that someone can use from the command line, without having to use Python directly (i.e. someone could do the modality calculations on the command line, then do visualization or further analysis in R or Matlab. I don't want to lock someone into Python necessarily)