fmicompbio / monaLisa

binned motif enrichment analysis and visualisation
https://fmicompbio.github.io/monaLisa/
GNU General Public License v3.0
36 stars 6 forks source link

maximal sequence length for homer2 #4

Closed mbstadler closed 5 years ago

mbstadler commented 5 years ago

homer2 segfaults if it is trying to read in a sequence longer than 1 Mio. bases (see cpp/Motif2.h:#define MOTIF2_BUFFER 10000100 and also char* curSeq = new char[1000000]; in cpp/Motif2.cpp (SequenceArray::parseFasta2SeqAndGroupFiles).

lisa::findMotifHits(..., method = "homer2") should check if there are any longer sequences, and either throw an error or tile the sequence and assemble the hits from the tiles to correct coordinates.

mbstadler commented 5 years ago

done.

for the moment, findMotifHits(..., method = "homer2") throws an error if there are sequences longer than or equal to 1 Mio. bases, reporting the names of the problematic sequences.

A more convenient solution, although not clear if really needed, would be to tile the sequence and later reassemble the hits (currently not implemented).