ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

High memory usage when using '-r 1' #53

Open jmonlong opened 4 years ago

jmonlong commented 4 years ago

I understand the -r parameter may be experimental and often not what we want. I just want to report that it used a lot of memory when I used it.

I was testing this on a dataset containing chr20 assemblies for 4 samples + hg38 (~60Mb FASTA each). I aligned with minimap2 asm20 and used fpa with drop -l 10000 to filter the PAF file.

seqwish -r 1 took 24 mins using 16 cores and peaked at 150Gb of memory. seqwish -k 256 -l 256 took 5 mins and max 1Gb of memory.

I'm using -k and -l now so I'm not blocked by this issue, just wanted to report the big difference in memory usage in case it helps.

ekg commented 4 years ago

Thanks for the report. I thought about this and I am pretty sure I know what's happening. The fact that -r 1 triggers it is key.

-k, -l and -r will cause RAM-backed memory usage to go up. The looping bits of sequence need to be cached. This can result in a very large memory footprint. It might be possible to use a disk-backed approach here, if we decide it's needed.

On Wed, Jun 10, 2020, 08:19 Jean Monlong notifications@github.com wrote:

I understand the -r parameter may be experimental and often not what we want. I just want to report that it used a lot of memory when I used it.

I was testing this on a dataset containing chr20 assemblies for 4 samples

  • hg38 (~60Mb FASTA each). I aligned with minimap2 asm20 and used fpa with drop -l 10000 to filter the PAF file.

seqwish -r 1 took 24 mins using 16 cores and peaked at 150Gb of memory. seqwish -k 256 -l 256 took 5 mins and max 1Gb of memory.

I'm using -k and -l now so I'm not blocked by this issue, just wanted to report the big difference in memory usage in case it helps.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEORMFXQ6MA7O65LANDRV4QXPANCNFSM4N2B7TOQ .