DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
120 stars 40 forks source link

execution not successful #22

Closed jfass closed 5 years ago

jfass commented 5 years ago

Hi! I installed via conda, added a genome file via idxstats (for hg19 with "chr" in the chromosome names, pulling reads from the MHC region on chr6, the chr6 haplotype contigs, and unmapped reads), and started jobs for a couple of whole exome BAM files. One's still running, but two have stopped with a mysterious error.

STDOUT:

HLA-LA.pl

Identified paths:
        samtools_bin: /home/ubuntu/HLA.LA/miniconda3/bin/samtools
        bwa_bin: /home/ubuntu/HLA.LA/miniconda3/bin/bwa
        java_bin: /home/ubuntu/HLA.LA/miniconda3/bin/java
        picard_sam2fastq_bin: /home/ubuntu/HLA.LA/miniconda3/bin/picard
        General working directory: /home/ubuntu/HLA.LA
        Sample-specific working directory: /home/ubuntu/HLA.LA/primary

Extract reads from 9 regions...
Extract unmapped reads...
Merging...
Indexing...
Extract FASTQ...
        /home/ubuntu/HLA.LA/miniconda3/bin/picard SamToFastq VALIDATION_STRINGENCY=LENIENT I=/home/ubuntu/HLA.LA/primary/extraction.bam F=/home/ubuntu/HLA.LA/primary/R_1.fastq F2=/home/ubuntu/HLA.LA/primary/R_2.fastq FU=/home/ubuntu/HLA.LA/primary/R_U.fastq 2>&1

Now executing:
../bin/HLA-LA --action HLA --maxThreads 5 --sampleID primary --outputDirectory /home/ubuntu/HLA.LA/primary --PRG_graph_dir /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /home/ubuntu/HLA.LA/primary/R_U.fastq.splitLongReads --FASTQ1 /home/ubuntu/HLA.LA/primary/R_1.fastq --FASTQ2 /home/ubuntu/HLA.LA/primary/R_2.fastq --bwa_bin /home/ubuntu/HLA.LA/miniconda3/bin/bwa --samtools_bin /home/ubuntu/HLA.LA/miniconda3/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0
Set maxThreads to 5
 [ Fri May 17 21:34:02 2019 ] Graph serialization existing and newer than graph file; read from /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Fri May 17 21:39:27 2019 ]   done.

STDERR:

[bam_translate] RG tag "20171025_1728510947" on read "K00277:41:HMCGGBBXX:1:1101:10318:13042" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "K00277:41:HMCGGBBXX:1:1101:10318:13042" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
 [ Fri May 17 21:39:58 2019 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Fri May 17 21:40:01 2019 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/home/ubuntu/HLA.LA/miniconda3/bin/bwa mem -t5 -M -a /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /home/ubuntu/HLA.LA/primary/R_1.fastq /home/ubuntu/HLA.LA/primary/R_2.fastq | /home/ubuntu/HLA.LA/miniconda3/bin/samtools view -Sb - > /home/ubuntu/HLA.LA/primary/remapped_with_a.bam.unsorted
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /home/ubuntu/HLA.LA/miniconda3/bin/bwa mem -t5 -M -a /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /home/ubuntu/HLA.LA/primary/R_1.fastq /home/ubuntu/HLA.LA/primary/R_2.fastq | /home/ubuntu/HLA.LA/miniconda3/bin/samtools view -Sb - > /home/ubuntu/HLA.LA/primary/remapped_with_a.bam.unsorted returned code -1
HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 5 --sampleID primary --outputDirectory /home/ubuntu/HLA.LA/primary --PRG_graph_dir /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /home/ubuntu/HLA.LA/primary/R_U.fastq.splitLongReads --FASTQ1 /home/ubuntu/HLA.LA/primary/R_1.fastq --FASTQ2 /home/ubuntu/HLA.LA/primary/R_2.fastq --bwa_bin /home/ubuntu/HLA.LA/miniconda3/bin/bwa --samtools_bin /home/ubuntu/HLA.LA/miniconda3/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

Any idea what this is about?

Thanks, ~Joe

jfass commented 5 years ago

Oh wait ...

Aha. I started all three at the same time, for the first time running against hg19 ... so ... looks like 'serializedGRAPH' and 'serializedGRAPH_preGapPathIndex' were created by one of the three processes (probably the one that's still running), then the other two looked and saw new versions of those files, tried to use them (while they're still being built by the other process), and failed.

I think. I'll close this issue, but please let me know if my understanding isn't correct.

Thanks, ~Joe

Oh ... it would be nice to know when and for how long the 'serializedGRAPH...' files are built and take to build.