[INFO] Using Pyguppy for Read-Until: Does the compute time overhead to sequence multiple chunks scale linearly or sub-linearly?

LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers

https://looselab.github.io/readfish/

GNU General Public License v3.0

167 stars 31 forks source link

[INFO] Using Pyguppy for Read-Until: Does the compute time overhead to sequence multiple chunks scale linearly or sub-linearly? #191

Closed harisankarsadasivan closed 2 years ago

harisankarsadasivan commented 2 years ago

If my read has 4000 samples and I invoke pyguppy caller on chunks of size 1000. Does it mean the first call on 1000 samples would take the most time? How much time would the next call on 2000,3000 and 4000 samples take? Is it going to scale linearly or sub-linearly? I would assume sub-linear. Or is it going to be same as computing with a chunk size of 4000? I would like to understand this scaling factor approximately. Please point me to any relevant code or numbers.

mattloose commented 2 years ago

We haven't benchmarked the times for this to the degree where I could comment. Like you I presume that this would be sub linear but by how much? I don't know.

It is a trivial exercise to benchmark this using the client and some reads.