Open jmonlong opened 4 years ago
Thanks for the report. I thought about this and I am pretty sure I know what's happening. The fact that -r 1 triggers it is key.
-k, -l and -r will cause RAM-backed memory usage to go up. The looping bits of sequence need to be cached. This can result in a very large memory footprint. It might be possible to use a disk-backed approach here, if we decide it's needed.
On Wed, Jun 10, 2020, 08:19 Jean Monlong notifications@github.com wrote:
I understand the -r parameter may be experimental and often not what we want. I just want to report that it used a lot of memory when I used it.
I was testing this on a dataset containing chr20 assemblies for 4 samples
- hg38 (~60Mb FASTA each). I aligned with minimap2 asm20 and used fpa with drop -l 10000 to filter the PAF file.
seqwish -r 1 took 24 mins using 16 cores and peaked at 150Gb of memory. seqwish -k 256 -l 256 took 5 mins and max 1Gb of memory.
I'm using -k and -l now so I'm not blocked by this issue, just wanted to report the big difference in memory usage in case it helps.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEORMFXQ6MA7O65LANDRV4QXPANCNFSM4N2B7TOQ .
I understand the
-r
parameter may be experimental and often not what we want. I just want to report that it used a lot of memory when I used it.I was testing this on a dataset containing chr20 assemblies for 4 samples + hg38 (~60Mb FASTA each). I aligned with minimap2 asm20 and used fpa with
drop -l 10000
to filter the PAF file.seqwish -r 1
took 24 mins using 16 cores and peaked at 150Gb of memory.seqwish -k 256 -l 256
took 5 mins and max 1Gb of memory.I'm using
-k
and-l
now so I'm not blocked by this issue, just wanted to report the big difference in memory usage in case it helps.