Open ethan-baldwin opened 9 months ago
Hey Ethan! Just came across this post because I am doing a similar project with sunflower. Did you end up figuring this out? I'm wondering if you could increase the bloom filter setting (-f) to 38 or 39 to reduce memory usage. Or perhaps you could reduce the maximum k-mer occurance threshold (--max-kocc)? Maybe you could even break up your inputs and somehow run hifiasm on two halves of your data somehow? I'm new to this program so don't feel confident and am just brainstorming! Let me know if you have had any luck. I'm worried I will face the same issue when I get my omni-c data back since my trio-binning run is also using about 400gb of memory!
Actually another solution is to use fewer CPUs, that might be also helpful to reduce the memory. We would like to release a new version that takes less memory soon.
Thanks for the helpful replies! I tried with fewer CPUs (64 > 8) and I moved past the stage where I normally ran out of memory, but now I am getting a seg fault. hifiasm.hifiasm_27625222.txt
Hi @ethan-baldwin, I am wondering if you can share the bin files with me? Then I could do a very quick test to fix this issue. This should be a bug, and fixing it will be very helpful for us.
I would love to, but the bin files add up to ~300gb. What is the best way for me to share them with you?
Thank you so much @ethan-baldwin! Could you please show me a screenshot for each bin file? Some bin files are not necessary for me to debug.
This issue is likely to be a small bug for the latest version of hifiasm, which has also been mentioned several times by other users. It is very helpful if I can get the data and do a quick test to fix it. Currently there is another option: running an old version with current bin files (see:https://github.com/chhylp123/hifiasm/issues/613).
Here is the directory:
Do you want a screenshot of part of the bin files like this?
When I have time I will try installing an older version of hifiasm.
@ethan-baldwin Could you please share me sarracenia.hic.ec.bin
, sarracenia.hic.ovlp.source.bin
, sarracenia.hic.ovlp.reverse.bin
and , sarracenia.hic.hic.lk.bin
with me? It would be better that you can also share the command lines/hifaism version you were using with me. Thank you so much for your great help!
@chhylp123 I sent you an email to discuss how to transfer these large files. Here is the command:
hifiasm -o sarracenia.hic -t 8 \
--h1 ../reads/KXRJ_OmniC_NA_NA_TGAGCTAG_Sarracenia_baldwin_OmniC-Sarracenia_baldwin_OmniC_I1371_L1_R1.fastq.bz2 \
--h2 ../reads/KXRJ_OmniC_NA_NA_TGAGCTAG_Sarracenia_baldwin_OmniC-Sarracenia_baldwin_OmniC_I1371_L1_R2.fastq.bz2 \
../reads/m84053_231129_210740_s2.hifi_reads.bc2012.fastq.gz \
../reads/m84053_231129_213847_s3.hifi_reads.bc2012.fastq.gz \
../reads/m84053_231129_220953_s4.hifi_reads.bc2012.fastq.gz \
../reads/m84053_231204_221754_s3.hifi_reads.bc2012.fastq.gz
And the hifiasm version is 0.19.6
@chhylp123 If you are still interested in troubleshooting this issue, I can share these files with you via globus, unless you have another file sharing solution. Thanks!
I am assembling a ~3.5gb genome with around ~100x hifi reads. I want to compare the phasing results between hi-c (omni-c in this case) and trio-binning. The trio-binning completes using 400gb of memory, however the hi-c phasing on the same error-corrected reads is running out of memory even when I give it 950gb (almost the max amount on my university's cluster). There is 170gb of omni-c data.
Here is the log: