ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
366 stars 65 forks source link

Use outside of R #121

Open maxkfranz opened 1 year ago

maxkfranz commented 1 year ago

Hi, everyone in @ctlab. This is @maxkfranz from @cytoscape (U. Toronto). I hope this message finds you well.

FGSEA is a great algorithm, and it would be useful to have it in contexts other than R. R is great for experimenting with one-off scripts, but for things beyond that it would be great to have access to FGSEA outside of R.

Examples:

(1) If you’re building a user-facing app with high performance, reliability, and good UX, you probably want to use something other than R. Being able to use the C++ code without any dependencies on R headers would be enough for this use case, if this is possible today.

(2) You may want to use Python instead of R, because Python has a lot of useful tech (e.g. ML). Many people would also say that Python is a better general-purpose programming language.

Would you comment on the feasibility of using FGSEA without R? Can the C++ code be used completely without R? Are there plans to support FGSEA in Python?

Best,

Max

assaron commented 1 year ago

Hi,

The core of FGSEA is written in C++ so I don't see any critical obstacles for it to be used independent of R environment. It's not trivial but pretty feasible. However, we don't have internal demand for it, as we're mostly use R for data analysis, and thus it's hard for us to dedicate resources for FGSEA to be more portable. That said, we'd be happy to collaborate with anyone who are willing to create and maintain, say, a Python fork of FGSEA.

Best, Alexey

maxkfranz commented 1 year ago

Thanks, Alexey. You mentioned that this wouldn't be trivial. Would you provide a brief summary -- maybe a bullet list -- of what the main tasks would be to carry this out, given your knowledge of the codebase?

We already know what would be generally involved in exposing something from C++ to Python, but it would be very useful if you would provide an outline of what would be needed to use FGSEA in C++ without R -- just the C++ part.

assaron commented 1 year ago

I think these are the main points, that have to be addressed: 1) What's the best (or at least suitable) way to maintain two interfaces, e.g. with R and Python: should the C++ code be made into a separate library (dynamic or maybe header-only), or just the same code can be copied from the main R repository to a Python version? 2) The C++ code should be separated into clean C++ and Rcpp-to-clean wrappers. This should be relatively easy, but there may be some performance concerns. 3) Some logic in R should be reimplemented: filtering of the sizes, actual calculation of p-values from the sampling data for fgseaSimple part, balancing fgseaSimple and fgseaMultilevel calls, parallelization etc. 4) Plots is a separate issue, and they probably have to reimplemented from scratch, as currently they're highly R/ggplot2-centric.

maxkfranz commented 1 year ago

Great, thanks for the pointers, @assaron