CMU-SAFARI / BLEND

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
Other
42 stars 4 forks source link

Questions on running Blend on laptop #3

Closed bio-xy closed 1 year ago

bio-xy commented 1 year ago

Hi Blend team, I am trying to run blend on my laptop to map recently released ONT duplex reads. I cut single fastq.gz into small chunks (each contains 20000 reads) and run below command. But after generating some .tmp file, the process was killed (I believe exceeding max mem ~9GB here).

blend -ax map-ont -t 6 --secondary=no -I 50M -a --split-prefix hg002 hg38.fa ont_small_chunk.fq.gz

I am not quite sure about "-I 50M" just assuming blend will map reads to part of the whole index to save memory. Am I right? Any advice to run blend on platform with restrained resources? Or maybe it should not be run this way. Thanks a lot!

Original fastq is here: https://human-pangenomics.s3.amazonaws.com/submissions/0CB931D5-AE0C-4187-8BD8-B3A9C9BFDADE--UCSC_HG002_R1041_Duplex_Dorado/Dorado_v0.1.1/stereo_duplex/11_15_22_R1041_Duplex_HG002_1_Dorado_v0.1.1_400bps_sup_stereo_duplex_pass.fastq.gz

canfirtina commented 1 year ago

Hi @bio-xy. I was able to reproduce your observation. Your process was probably killed due to the memory issue as the peak memory usage is around 9.3GB when using BLEND as you described.

I have two suggestions to reduce the peak memory usage. First, you can further decrease -I (e.g., -I 30M). Second, you can also reduce the mini-batch size (amount of bases loaded into the memory as a batch for mapping) with -K (e.g., -K 100M). The following uses around 6GB:

blend -ax map-ont -t 6 --secondary=no -I 30M -K 100M -a --split-prefix hg002 hg38.fa ont_small_chunk.fq.gz

One irrelevant suggestion: If you cut the entire fastq.gz into smaller chunks to reduce the memory usage, you do not have to do that. You can simply set -K to some value, which will ensure that BLEND processes a limited amount of sequences at a time (similar to minimap2).

bio-xy commented 1 year ago

Job (with above settings) still got killed when it begins with chr2 ... image

I am not sure if there is any other way to make it more robust than cutting into smaller chunks..

canfirtina commented 1 year ago

Unfortunately, I could not reproduce this. Is there any other process that may be taking a large amount of memory such that the available memory becomes much smaller than 9GB when running BLEND? Here is the /usr/bin/time -vpo output I get when using your inputs and settings in a server with AMD EPYC 7742 processor with 1TB main memory (Maximum resident set size shows the peak memory in KB):

Command being timed: "blend -ax map-ont -t 6 --secondary=no -I 50M -a --split-prefix hg002 hg38.fa ont_small_chunk.fq.gz"
        User time (seconds): 135116.26
        System time (seconds): 808.26
        Percent of CPU this job got: 590%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:23:37
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 10037040
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 1350500587
        Voluntary context switches: 85887
        Involuntary context switches: 165369
        Swaps: 0
        File system inputs: 9008088
        File system outputs: 5271464
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The following is the time output when using the settings I suggested above:

Command being timed: "blend -ax map-ont -t 6 --secondary=no -I 30M -K 100M -a --split-prefix hg2002 hg38.fa ont_small_chunk.fq.gz"
        User time (seconds): 135620.18
        System time (seconds): 1035.77
        Percent of CPU this job got: 587%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:27:30
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 7627396
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1548545951
        Voluntary context switches: 88899
        Involuntary context switches: 165942
        Swaps: 0
        File system inputs: 2503752
        File system outputs: 5347544
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The first one uses around 10GB and the second run uses around 7.6GB. If you need to run BLEND for further restrained resources, I would suggest decreasing the -I and/or -K accordingly.

bio-xy commented 1 year ago

I ran blend in wsl but not sure if any background process may interfere with this. Anyway, thanks for the info. I think I'd better get a big machine for this...

canfirtina commented 1 year ago

Sure. By the way I am using the "alignment ready" version of hg38 in the analysis set of hg38 (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/) without the non-canonical contigs. Although non-canonical contigs are relatively smaller than the canonical ones, if you are including them in your analysis, your memory usage may also be slightly larger. I hope you can run your analysis. Closing this comment now but feel free to re-open it in case you get a similar issue.