ShujiaHuang / basevar

This is the official development repository for BaseVar, which call variants for large-scale ultra low-pass (<1.0x) WGS data, especially for NIPT data
https://www.cell.com/cell-genomics/fulltext/S2666-979X(24)00288-X
GNU General Public License v3.0
25 stars 7 forks source link

Basevar needs a lot of memory to run successfully #14

Open thanhtruc44 opened 3 months ago

thanhtruc44 commented 3 months ago

Dear author,

I have used Basevar on 5 BAM files, each approximately 3GB in size, but it requires a substantial amount of memory to run successfully, specifically up to 200GB. I would like to know if I am operating it correctly. The command I used is as follows:

basevar basetype  -R "reference/GCA_000003025.6_Sscrofa11.1_genomic.fa" 
-I SRR14775051_Capture.dedup.bam -I SRR14775062_Capture.dedup.bam 
-I SRR14775073_Capture.dedup.bam -I SRR14775074_Capture.dedup.bam 
-I SRR14775085_Capture.dedup.bam  --batch-count 50 --filename-has-samplename  
--output-vcf "pig.vcf.gz"   --output-cvg "pig.cvg.tsv.gz" --nCPU 32
ShujiaHuang commented 1 month ago

Hi, thanks for your information. I've totally fix these issues by using C++ to reconstruct the whole codes of basevar. I recommend utilizing the C++ version directly, which can be accessed at : https://github.com/ShujiaHuang/basevar.

In this updated version, each thread (-t/--thread) typically necessitates only 3GB to 4GB of memory if the -B (--batch-count) parameter is set to 200. I anticipate that for your specific scenario, the memory requirement might be less than 10GB.