giesselmann / STRique

Nanopore raw signal repeat detection pipeline
MIT License
45 stars 10 forks source link

runtime is several hours #5

Closed rainwala closed 5 years ago

rainwala commented 5 years ago

Hello, I am running STRique on a nanopore experiment with 20000 reads on a workstation with 64 threads, and the runtime is several hours. On the STRique github page it says the runtime should be a few minutes.

I was wondering, is this because you are running STRique on experiments with just a few hundred reads that you got from a Cas9/Cas12 enrichment experiment? Our experiment used PCR enrichment, so we have several thousand reads.

giesselmann commented 5 years ago

We ran STRique on both, plasmid and enrichment data. The runtime depends on multiple factors, the overall read length (impacts the initial flanking sequence detection) and the repeat length (impacts the actual detection). Both, signal alignment and HMM are computationally quite expensive. The runtimes in the README are indeed only for the few test/example reads.

rainwala commented 5 years ago

Thank you, our reads are pretty long, which might explain the runtime. I also ran it with 1000bp flanking sequence in the configuration file. If I reduce that, will it reduce runtime?

giesselmann commented 5 years ago

yes, 1kb flank is very long, using 150 to 250 is sufficient.

rainwala commented 5 years ago

Thank you.