MIDAS2 `merge_snps` Command Stuck with No Progress and Incomplete Output

Description:

I encountered an issue when running the midas2 merge_snps command. The process has been running for several days without noticeable progress. The log file seems to show repeated "start" and "finish" messages for various species, but many of the output folders for certain species are either empty or incomplete.

Steps to Reproduce:

First, I obtained the SNPs information using midas2 with the following script to generate the snp information for 200 samples.

midas2 run_species \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2

midas2 run_snps \
                --sample_name ${sample} \
                -1 cleandata/${sample}.R1.fq.gz \
                -2 cleandata/${sample}.R2.fq.gz \
                --midasdb_name uhgg \
                --midasdb_dir /Disk2/database/midas2/uhgg/ \
                --num_cores 16 \
                midas2

Then, I ran the following midas2 merge_snps command:


echo -e "sample_name\tmidas_outdir" >> midas2_sample_list.txt

   ls midas2/ | while read line; do 
       if [ -f midas2/${line}/snps/snps_summary.tsv ]; then 
           echo -e "$line\tmidas2" >> midas2_sample_list.txt
       fi 
   done

   midas2 merge_snps \
     --samples_list midas2_sample_list.txt \
     --midasdb_name uhgg \
     --midasdb_dir /Disk2/database/midas2/uhgg/ \
     --num_cores 120 \
     --chunk_size 200000 --robust_chunk \
     --sample_counts 10 \
     midas2_merge

Observed Behavior:

The process has been running for 3 days with no significant progress.
The log file shows repeated messages of "start" and "finish" for accumulate_samples and call_and_write_population_snps, as shown below:

1727460035.2:      MIDAS2::species_worker::102298--2::start call_and_write_population_snps
1727460042.5:    MIDAS2::process::102538-1::finish snps_worker
1727460042.5:    MIDAS2::process::102538--1::start collect_chunks
1727460043.6:      MIDAS2::species_worker::100273--2::finish accumulate_samples
1727460043.6:      MIDAS2::species_worker::100273--2::start call_and_write_population_snps
1727460043.9:    MIDAS2::process::102538--1::finish collect_chunks
1727460060.8:      MIDAS2::species_worker::101367--2::finish accumulate_samples
1727460060.8:      MIDAS2::species_worker::101367--2::start call_and_write_population_snps
...

Many species result directories in midas2_merge/snps/ are empty or contain only partial files. For example:

midas2_merge/snps/100078:
100078.snps_depth.tsv.lz4  100078.snps_freqs.tsv.lz4  100078.snps_info.tsv.lz4

midas2_merge/snps/100084: [empty]

midas2_merge/snps/100087: [empty]

midas2_merge/snps/100099:
100099.snps_depth.tsv.lz4  100099.snps_freqs.tsv.lz4  100099.snps_info.tsv.lz4
...

Expected Behavior:

The merge_snps command should complete within a reasonable time frame and produce merged SNP files for all species without leaving empty or incomplete folders.

System Information:

MIDAS2 version: [MIDAS2]
Database: UHGG
Number of cores: 120
Chunk size: 200,000
Operating system: [ubuntu 22]

Log File Excerpts:

Here are some excerpts from the log file for reference:

1727460035.2:      MIDAS2::species_worker::102298--2::start call_and_write_population_snps
1727460042.5:    MIDAS2::process::102538-1::finish snps_worker
1727460042.5:    MIDAS2::process::102538--1::start collect_chunks
1727460043.6:      MIDAS2::species_worker::100273--2::finish accumulate_samples
1727460043.6:      MIDAS2::species_worker::100273--2::start call_and_write_population_snps
...

Request:

Could you please investigate this issue? Any guidance on how to resolve it would be greatly appreciated. I'm particularly concerned about the empty species folders and the long runtime without progress.

Thank you for your help!

Hi,

Thank you for providing the detailed log. The merge_snps process for 200 samples should not take 3 days. It seems like the issue might be related to memory limitations or CPU thrashing.

Could you confirm the total memory available on your machine? This task is memory-intensive, and if progress has stalled for 3 days, it’s possible the machine was overwhelmed. The call_and_write_population_snps step loads chunk pileups from all samples into memory to calculate population SNPs. The more cores you use, the more memory your system needs. For 200 samples, I recommend using a machine with at least 120 GB of memory and 16 cores (using --num_cores 16), while keeping the default chunk size. If your machine has more memory, you can try increasing to 32 cores.

A few additional notes:

Are you using vCPUs or physical CPUs?
The --chunk_size 200000 isn’t the default chunk size. I recommend running:

midas compute_chunks --chunk_type merge_snps --chunk_size 200000 --species all --midasdb_name $db_name --midasdb_dir $db_dir --debug --force -t ${num_cores}

This will calculate the chunk information accordingly.

The empty species folders are created by MIDAS during the preprocessing phase before multiprocessing begins, so this is expected behavior and not a bug.

Let me know if this works for you!

Best Chunyu

czbiohub-sf / MIDAS