czbiohub-sf / MIDAS

Metagenomic Intra-Species Diversity Analysis (MIDAS)
MIT License
36 stars 10 forks source link

MIDAS2 `merge_snps` Command Stuck with No Progress and Incomplete Output #145

Open yejunbin opened 1 month ago

yejunbin commented 1 month ago

Description:

I encountered an issue when running the midas2 merge_snps command. The process has been running for several days without noticeable progress. The log file seems to show repeated "start" and "finish" messages for various species, but many of the output folders for certain species are either empty or incomplete.

Steps to Reproduce:

  1. First, I obtained the SNPs information using midas2 with the following script to generate the snp information for 200 samples.
midas2 run_species \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2

midas2 run_snps \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2 
  1. Then, I ran the following midas2 merge_snps command:

echo -e "sample_name\tmidas_outdir" >> midas2_sample_list.txt

   ls midas2/ | while read line; do 
       if [ -f midas2/${line}/snps/snps_summary.tsv ]; then 
           echo -e "$line\tmidas2" >> midas2_sample_list.txt
       fi 
   done

   midas2 merge_snps \
     --samples_list midas2_sample_list.txt \
     --midasdb_name uhgg \
     --midasdb_dir /Disk2/database/midas2/uhgg/ \
     --num_cores 120 \
     --chunk_size 200000 --robust_chunk \
     --sample_counts 10 \
     midas2_merge

Observed Behavior:

Expected Behavior:

System Information:

Log File Excerpts:

Here are some excerpts from the log file for reference:

1727460035.2:      MIDAS2::species_worker::102298--2::start call_and_write_population_snps
1727460042.5:    MIDAS2::process::102538-1::finish snps_worker
1727460042.5:    MIDAS2::process::102538--1::start collect_chunks
1727460043.6:      MIDAS2::species_worker::100273--2::finish accumulate_samples
1727460043.6:      MIDAS2::species_worker::100273--2::start call_and_write_population_snps
...

Request:

Could you please investigate this issue? Any guidance on how to resolve it would be greatly appreciated. I'm particularly concerned about the empty species folders and the long runtime without progress.

Thank you for your help!

zhaoc1 commented 1 month ago

Hi,

Thank you for providing the detailed log. The merge_snps process for 200 samples should not take 3 days. It seems like the issue might be related to memory limitations or CPU thrashing.

Could you confirm the total memory available on your machine? This task is memory-intensive, and if progress has stalled for 3 days, it’s possible the machine was overwhelmed. The call_and_write_population_snps step loads chunk pileups from all samples into memory to calculate population SNPs. The more cores you use, the more memory your system needs. For 200 samples, I recommend using a machine with at least 120 GB of memory and 16 cores (using --num_cores 16), while keeping the default chunk size. If your machine has more memory, you can try increasing to 32 cores.

A few additional notes:

midas compute_chunks --chunk_type merge_snps --chunk_size 200000 --species all --midasdb_name $db_name --midasdb_dir $db_dir --debug --force -t ${num_cores}

This will calculate the chunk information accordingly.

Let me know if this works for you!

Best Chunyu