As of now, flotilla is a monstrous beast of code that tries to do everything and thus does nothing well. I see three major partitions of what flotilla does, and propose to fragment these into separate, well-defined packages.
Machine learning visualization
By this, I mean mostly the NMF and PCA visualizations. The ability to calculate and plot PCA in python is very lacking, in comparison to R.
Another major feature of flotilla that is often touted is the ability to use IPython's javascript widgets to look at different subsets of the data. I think this can be related to the ML visualization eventually but I'm not sure at this point
Splicing computation
By this, I mean the splicing-specific modality calculations and NMF space reductions, which are only applied to the splicing data. I want these to be a separate package that someone can use from the command line, without having to use Python directly (i.e. someone could do the modality calculations on the command line, then do visualization or further analysis in R or Matlab. I don't want to lock someone into Python necessarily)
As of now, flotilla is a monstrous beast of code that tries to do everything and thus does nothing well. I see three major partitions of what flotilla does, and propose to fragment these into separate, well-defined packages.
Machine learning visualization
By this, I mean mostly the NMF and PCA visualizations. The ability to calculate and plot PCA in python is very lacking, in comparison to R.
I'm not a fan of the exact R syntax, but the fact that you don't need to iterate over the different axes is huge. I'd like to model it after
seaborn
's Facet Grids http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.htmlMetadata-based subsetting/IPython widgets
Another major feature of
flotilla
that is often touted is the ability to use IPython's javascript widgets to look at different subsets of the data. I think this can be related to the ML visualization eventually but I'm not sure at this pointSplicing computation
By this, I mean the splicing-specific modality calculations and NMF space reductions, which are only applied to the splicing data. I want these to be a separate package that someone can use from the command line, without having to use Python directly (i.e. someone could do the modality calculations on the command line, then do visualization or further analysis in R or Matlab. I don't want to lock someone into Python necessarily)