`motifSimilarity` (former `clusterPWMs`) features

mbstadler commented 5 years ago

Currently, clusterPWMs accepts only motifs from a file and requires an output file name (outfile argument).

What about extending clusterPWMs to:

have multiple methods (along the line of findMotifHits), e.g. one for motifs in a file, and another for motifs as a PFMatrixList
set the default for outfile = NULL and in that case, captures the homer output in a temporary file that will be deleted after parsing. Providing a value for outfile will still work as it does currently
make use of the -cpu flag to speed up computations

Finally, what about:

porting compareMotifs.pl to R (maybe C++) so that it will run fast enough to make it a default when creating a SummarizedExperiment - the information seems very useful and might be worth to have always available

mbstadler commented 5 years ago

I have made an R implementation of the motif comparison used in compareMotifs.pl. It calculated the all-pairs similarity matrix for a set of 579 vertebrate factors in JASPAR2018 in less than 4 minutes (pure R implementation, single CPU). Results are identical.

I have added naive parallelization using mclapply as well, which brings this down to ~25s on 30 cores.

Is it worth considering that instead of compareMotifs.pl?

mbstadler commented 3 years ago

The function has been renamed motifSimilarity

fmicompbio / monaLisa

`motifSimilarity` (former `clusterPWMs`) features #23