MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

BWA terminated by SIGKILL, PALEOMIX in BAM pipline #54

Closed Uriwolkow closed 8 months ago

Uriwolkow commented 8 months ago

I have recently tried running the BAM pipeline for PE reads. But once I run it against a nuclear reference genome, each sample generates a long error and an empty bam file. STDOUT and STDERR* files are almost all empty, too. The errors all look something like this, first error terminated by SIGKILL and second terminated by PALEOMIX:

PALEOMIX         = v1.3.8
Command          = '/powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/bin/paleomix bam run makefile_PE_allbig_bubals_nuclear_mito_21.01.24.YAML'
CWD              = '/scratch300/uriw1/Bubals'
PATH             = '/powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/bin:/powerapps/share/centos7/miniconda/miniconda3-2023/condabin:/powerapps/share/ExaML/examl:/powerapps/share/ExaML/parser:/powerapps/share/centos7/miniconda/miniconda3-2023/etc/profile.d/miniconda3-2023-environmentally/condabin:/powerapps/share/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local.cc/bin:/mathematica/vers/11.2'
Node             = alignment of './bubal_NM231/reads/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/reads.collapsed.gz' onto GCA_006408545.1_HBT_genomic using BWA samse
Threads          = 1
Input files      = ./bubal_NM231/bubal_genome_reference/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/collapsed.sai
                   ./bubal_NM231/reads/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/reads.collapsed.gz
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta.amb
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta.ann
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta.bwt
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta.pac
                   /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta.sa
Output files     = ./bubal_NM231/bubal_genome_reference/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/collapsed.bam
Auxiliary files  = /powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/lib/python3.7/site-packages/paleomix/main.py
Executables      = /powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/bin/python
                   bwa

Errors =
Parallel processes:
  Process 1:
    Command = bwa samse \
                  /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta \
                  ./bubal_NM231/bubal_genome_reference/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/collapsed.sai \
                  ./bubal_NM231/reads/Sample_NM231/Seq_NM231/Lane_2/V350180618_L02_NGS444_s2_Seq_NM231_x.fq.gz/reads.collapsed.gz
    Status  = Terminated with signal SIGKILL
    STDOUT  = Piped to process 2
    STDERR* = '/scratch300/uriw1/Bubals/temp/e19d64cc-c3d4-4ded-89d5-3d8ada5945d5/pipe_bwa_139916491756176.stderr'
    CWD     = '/scratch300/uriw1/Bubals'

  Process 2:
    Command = /powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/bin/python \
                  /powerapps/share/centos7/miniconda/miniconda3-2023/envs/paleomix_new_env/lib/python3.7/site-packages/paleomix/main.py \
                  cleanup --fasta \
                  /scratch300/uriw1/Bubals/Bubal_reference/ncbi_dataset/data/GCA_006408545.1/GCA_006408545.1_HBT_genomic.fasta \
                  --temp-prefix \
                  /scratch300/uriw1/Bubals/temp/e19d64cc-c3d4-4ded-89d5-3d8ada5945d5/bam_cleanup \
                  --rg-id Seq_NM231 --rg SM:Sample_NM231 --rg LB:Seq_NM231 --rg PU:Lane_2 --rg \
                  PL:ILLUMINA --rg PG:bwa --rg \
                  'DS:/scratch300/uriw1/Bubals/Bubal_raw_data/V350180618_L02_NGS444_s2_Seq_NM231_[12].fq.gz' \
                  -q 25 -F 0x4
    Status  = Automatically terminated by PALEOMIX
    STDIN   = Piped from process 1
    STDOUT  = '/scratch300/uriw1/Bubals/temp/e19d64cc-c3d4-4ded-89d5-3d8ada5945d5/collapsed.bam'
    STDERR* = '/scratch300/uriw1/Bubals/temp/e19d64cc-c3d4-4ded-89d5-3d8ada5945d5/pipe_python_139916491756752.stderr'
    CWD     = '/scratch300/uriw1/Bubals'

I've also attached my makefile as a .txt:

makefile_PE_allbig_bubals_nuclear.txt

maybe the issue is the wildcard token following the {Pair}*?

Best regards, Uri Wolkowski

MikkelSchubert commented 8 months ago

Hi Uri,

Your yaml file looks fine to me.

The problem is, as you point out, that the first process was terminated by SIGKILL and as a consequence, the pipeline automatically terminated the second process. That generally indicates that it was killed by an external process rather than due to a problem with the software (bwa) or with the pipeline.

I don't know the system your are running the pipeline on, but it's possible that the processed got killed due to excessive memory usage. Either by the task management system (Torque, PBS, Slurm), if you are using one such, or earlyoom/the kernel OOM killer.

If so then you can possibly find an explanation in the output from dmesg or in /var/log/messages or in the task status or log for your task management system.

If that is the explanation, the BWA manual page states that

For short reads, the aln command uses ~3.2GB memory and the sampe command uses ~5.4GB.

If that is the cause, then you could decrease the maximum number of threads used by the pipeline (via --max-threads) to prevent too many instances running at once. Or you could try reserving more memory if you are using a task management system.

Best regards, Mikkel

Uriwolkow commented 8 months ago

Thank you for the detailed response. It indeed was an issue with memory usage during the run. By running on a different server we achieved good results with no errors, and w/o changing the yaml file. Closing this session as resolved.