Open mbstadler opened 5 years ago
I have made an R implementation of the motif comparison used in compareMotifs.pl
. It calculated the all-pairs similarity matrix for a set of 579 vertebrate factors in JASPAR2018 in less than 4 minutes (pure R implementation, single CPU). Results are identical.
I have added naive parallelization using mclapply
as well, which brings this down to ~25s on 30 cores.
Is it worth considering that instead of compareMotifs.pl
?
The function has been renamed motifSimilarity
Currently,
clusterPWMs
accepts only motifs from a file and requires an output file name (outfile
argument).What about extending
clusterPWMs
to:findMotifHits
), e.g. one for motifs in a file, and another for motifs as aPFMatrixList
outfile = NULL
and in that case, captures the homer output in a temporary file that will be deleted after parsing. Providing a value foroutfile
will still work as it does currently-cpu
flag to speed up computationsFinally, what about:
compareMotifs.pl
to R (maybe C++) so that it will run fast enough to make it a default when creating aSummarizedExperiment
- the information seems very useful and might be worth to have always available