DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
124 stars 42 forks source link

Can't start the test #5

Closed serge2016 closed 5 years ago

serge2016 commented 7 years ago

I have indexed the graph and now trying to run the test. But I can't:

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA/bin# ./HLA-PRG-LA --bwa_bin $BWA --samtools_bin $SAMTOOLS --BAM /outputs
/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
HLA-PRG-LA: HLA-PRG-LA.cpp:1350: int main(int, char**): Assertion `Utilities::directoryExists(allGraphsOutputDir)' failed.
Aborted (core dumped)
serge2016 commented 7 years ago
        std::string allGraphsOutputDir = (Utilities::directoryExists("tmp/simulatedGraphs") ? "tmp/simulatedGraphs" : "../tmp/simulatedGraphs");
        std::string allGraphsWorkingDir = (Utilities::directoryExists("tmp/working") ? "tmp/working" : "../tmp/working");

I have to create dirs tmp/simulatedGraphs and tmp/working in the current dir to path through...

AlexanderDilthey commented 7 years ago

Please use Perl script wrapper to control the binary, this will automatically put together the right command line.

If the path to the CRAM is outputs/NA12878.cram, this here would be the right command line:

./HLA-PRG-LA.pl --BAM outputs/NA12878.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 7

serge2016 commented 7 years ago
root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# ./src/HLA-PRG-LA.pl --bwa_bin $BWA --samtools_bin /soft/samtools/bin/samtools --picard_sam2fastq_bin /soft/picard-tools-1.119/SamToFastq.jar --BAM /outputs/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
inferHLATypes.pl

Identified paths:
        samtools_bin: /soft/samtools/bin/samtools
        bwa_bin: /soft/bwa.kit/bwa
        java_bin: /usr/bin/java
        picard_sam2fastq_bin: /soft/picard-tools-1.119/SamToFastq.jar
        General working directory: /outputs/soft/HLA-PRG-LA
        Sample-specific working directory: /outputs/soft/HLA-PRG-LA/NA12878_serge

Extract reads from 534 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "NA12878_GRCH38_ALT_684" on read "ERR091574.7445971" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
        /usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /soft/picard-tools-1.119/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/outputs/soft/HLA-PRG-LA/NA12878_serge/extraction.bam F=/outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq F2=/outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq 2>&1

Now executing:
../bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 8
 [ Tue Jun 20 18:58:30 2017 ] Graph serialization existing and newer than graph file; read from /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Tue Jun 20 19:05:36 2017 ]   done.
 [ Tue Jun 20 19:06:18 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Tue Jun 20 19:06:26 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted returned code -1
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0

It still fails..

AlexanderDilthey commented 7 years ago

Can you manually execute the command

/soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted

? Do you get a specific error message?

serge2016 commented 7 years ago

Yes! It's ok!

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 4697, 0, 2)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (273, 318, 360)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (99, 534)
[M::mem_pestat] mean and std.dev: (315.85, 71.20)
[M::mem_pestat] low and high boundaries for proper pairs: (12, 621)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_process_seqs] Processed 792080 reads in 185.692 CPU sec, 65.757 real sec
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 99, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (257, 307, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (97, 497)
[M::mem_pestat] mean and std.dev: (295.81, 70.35)
[M::mem_pestat] low and high boundaries for proper pairs: (14, 577)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 792080 reads in 164.732 CPU sec, 46.346 real sec
[M::process] read 384978 sequences (38882778 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 792080 reads in 171.780 CPU sec, 49.508 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 31, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (285, 328, 385)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (85, 585)
[M::mem_pestat] mean and std.dev: (329.97, 87.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 685)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 384978 reads in 90.872 CPU sec, 26.121 real sec
[main] Version: 0.7.13-r1126
[main] CMD: /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq
[main] Real time: 216.163 sec; CPU: 615.064 sec
AlexanderDilthey commented 7 years ago

Weird. If you now re-execute the following command, does it still fail?

../bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0

serge2016 commented 7 years ago

If I start the same original command, it fails even after bwa mem succeeded:

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# ./src/HLA-PRG-LA.pl --bwa_bin $BWA --samtools_bin /soft/samtools/bin/samtools --picard_sam2fastq_bin /soft/picard-tools-1.119/SamToFastq.jar --BAM /outputs/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
inferHLATypes.pl

Identified paths:
        samtools_bin: /soft/samtools/bin/samtools
        bwa_bin: /soft/bwa.kit/bwa
        java_bin: /usr/bin/java
        picard_sam2fastq_bin: /soft/picard-tools-1.119/SamToFastq.jar
        General working directory: /outputs/soft/HLA-PRG-LA
        Sample-specific working directory: /outputs/soft/HLA-PRG-LA/NA12878_serge

Extract reads from 534 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "NA12878_GRCH38_ALT_684" on read "ERR091574.7445971" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
        /usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /soft/picard-tools-1.119/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/outputs/soft/HLA-PRG-LA/NA12878_serge/extraction.bam F=/outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq F2=/outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq 2>&1

Now executing:
../bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 8
 [ Wed Jun 21 04:57:39 2017 ] Graph serialization existing and newer than graph file; read from /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Wed Jun 21 05:04:31 2017 ]   done.
 [ Wed Jun 21 05:04:59 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Wed Jun 21 05:05:08 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted returned code -1
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA#

This utilizes up to 21 Gb of RAM and then suddenly exits:

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# bin/HLA-PRG-LA --action HLA --maxThreads 8 --sampleID NA12878_serge --outputDirectory /outputs/soft/HLA-PRG-LA/NA12878_serge --PRG_graph_dir /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq --FASTQ2 /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 8
 [ Wed Jun 21 05:06:46 2017 ] Graph serialization existing and newer than graph file; read from /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Wed Jun 21 05:13:26 2017 ]   done.
 [ Wed Jun 21 05:13:59 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Wed Jun 21 05:14:08 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted returned code -1
Aborted (core dumped)
root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA#

If I start the last command, it runs ok, as we tested earlier (again checked it):

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/soft/HLA-PRG-LA/NA12878_serge/remapped_with_a.bam.unsorted
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (2, 4697, 0, 2)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (273, 318, 360)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (99, 534)
[M::mem_pestat] mean and std.dev: (315.85, 71.20)
[M::mem_pestat] low and high boundaries for proper pairs: (12, 621)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_process_seqs] Processed 792080 reads in 180.800 CPU sec, 65.617 real sec
[M::process] read 792080 sequences (80000080 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 99, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (257, 307, 337)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (97, 497)
[M::mem_pestat] mean and std.dev: (295.81, 70.35)
[M::mem_pestat] low and high boundaries for proper pairs: (14, 577)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 792080 reads in 158.728 CPU sec, 44.732 real sec
[M::process] read 384978 sequences (38882778 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 792080 reads in 166.936 CPU sec, 50.195 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 31, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (285, 328, 385)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (85, 585)
[M::mem_pestat] mean and std.dev: (329.97, 87.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 685)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 384978 reads in 87.876 CPU sec, 25.227 real sec
[main] Version: 0.7.13-r1126
[main] CMD: /soft/bwa.kit/bwa mem -t8 -M -a /outputs/soft/HLA-PRG-LA/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/soft/HLA-PRG-LA/NA12878_serge/R_1.fastq /outputs/soft/HLA-PRG-LA/NA12878_serge/R_2.fastq
[main] Real time: 213.442 sec; CPU: 596.240 sec
AlexanderDilthey commented 7 years ago

I've got no idea what's going on here - let me try to come up with an improved debugging solution.

AlexanderDilthey commented 7 years ago

Can you try the following:

1) Add the line #include <cstring> to the header section of file mapper/bwa/BWAmapper.cpp?

2) The file mapper/bwa/BWAmapper.cpp should contain the following code, starting at line 151:

int retCode = std::system(bwa_mapping_cmd.c_str());
if(retCode != 0)
{
    throw std::runtime_error("Command " + bwa_mapping_cmd + " returned code "+Utilities::ItoStr(retCode));
}

Can you please replace this code with the following segment?

int retCode = std::system(bwa_mapping_cmd.c_str());
if(retCode != 0)
{
    std::cerr << "Error code: " << retCode << " translation: " << std::strerror(retCode) << '\n' << std::flush;
    throw std::runtime_error("BWA command " + bwa_mapping_cmd + " returned code "+Utilities::ItoStr(retCode));
}

3) re-do make all

4) re-execute the inference command -- the error message should now be more informative.

serge2016 commented 7 years ago

Done. But it didn't help to understand anything...:

root@584e48a69cb8:/outputs# $HLAPRGLA --BAM NA12878.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge3 --maxThreads 4
inferHLATypes.pl

Identified paths:
        samtools_bin: /soft/samtools/bin/samtools
        bwa_bin: /soft/bwa.kit/bwa
        java_bin: /usr/bin/java
        picard_sam2fastq_bin: /soft/picard-tools-1.123/SamToFastq.jar
        General working directory: /outputs
        Sample-specific working directory: /outputs/NA12878_serge3

Extract reads from 534 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "NA12878_GRCH38_ALT_684" on read "ERR091574.7445971" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
        /usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /soft/picard-tools-1.123/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/outputs/NA12878_serge3/extraction.bam F=/outputs/NA12878_serge3/R_1.fastq F2=/outputs/NA12878_serge3/R_2.fastq 2>&1

Now executing:
../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge3 --outputDirectory /outputs/NA12878_serge3 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/NA12878_serge3/R_1.fastq --FASTQ2 /outputs/NA12878_serge3/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 4
 [ Fri Jun 23 13:03:46 2017 ] Graph serialization existing and newer than graph file; read from /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Fri Jun 23 13:10:00 2017 ]   done.
 [ Fri Jun 23 13:10:40 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Fri Jun 23 13:10:49 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/NA12878_serge3/R_1.fastq /outputs/NA12878_serge3/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/NA12878_serge3/remapped_with_a.bam.unsorted
Error code: -1 translation: Unknown error -1
terminate called after throwing an instance of 'std::runtime_error'
  what():  BWA command2 /soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/NA12878_serge3/R_1.fastq /outputs/NA12878_serge3/R_2.fastq | /soft/samtools/bin/samtools view -Sb - > /outputs/NA12878_serge3/remapped_with_a.bam.unsorted returned code -1
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge3 --outputDirectory /outputs/NA12878_serge3 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/NA12878_serge3/R_1.fastq --FASTQ2 /outputs/NA12878_serge3/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
serge2016 commented 7 years ago

Moreover, I added bwa's stderr redirection to a file:

        std::string bwa_mapping_cmd = bwa_bin + " mem -t" + Utilities::ItoStr(threads) + " -M " + with_a + indexedReferenceGenome + " " + FASTQ1 + " " + FASTQ2 + " 2> " + outputBAM + ".log1 | " + samtools_bin + " view -Sb - > " + outputUnsorted;
        std::cerr << "FLAG-FLAG-FLAG! BWA command: " << bwa_mapping_cmd.c_str() << "\n" << std::flush;
        int retCode = std::system(bwa_mapping_cmd.c_str());
        if(retCode != 0)
        {
                std::cerr << "Error code: " << retCode << " translation: " << std::strerror(retCode) << '\n' << std::flush;
                throw std::runtime_error("BWA command2 " + bwa_mapping_cmd + " returned code "+Utilities::ItoStr(retCode));
        }

And I do not see this file! It means that this command even doesn't start! When it fails I see only these files:

root@584e48a69cb8:/outputs# ls -la /outputs/NA12878_serge3/
total 1092320
drwxr-xr-x 2 root root      4096 Jun 23 16:57 .
drwxr-xr-x 5 root root      4096 Jun 23 13:01 ..
-rw-r--r-- 1 root root 315435851 Jun 23 16:50 R_1.fastq
-rw-r--r-- 1 root root 315435851 Jun 23 16:50 R_2.fastq
-rw-r--r-- 1 root root 243737344 Jun 23 16:49 extraction.bam
-rw-r--r-- 1 root root    164216 Jun 23 16:50 extraction.bam.bai
-rw-r--r-- 1 root root 243496642 Jun 23 16:49 extraction_mapped.bam
-rw-r--r-- 1 root root    243742 Jun 23 16:49 extraction_unmapped.bam
AlexanderDilthey commented 7 years ago

Hi Serge,

do you see the "FLAG-FLAG-FLAG" output? Otherwise I might have given you the wrong line (sorry!) -- could you, if you don't see the FLAG-FLAG-... output, do a gdb backtrace with the command:

../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge3 --outputDirectory /outputs/NA12878_serge3 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/NA12878_serge3/R_1.fastq --FASTQ2 /outputs/NA12878_serge3/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0

?

serge2016 commented 7 years ago

Yes, I do see the flag and the command with my redirection.

AlexanderDilthey commented 7 years ago

Very weird. One explanation might be that you don't have a shell available.

Can you add this prior to the command call?

if (!system( NULL )) { std::cerr << "NO SHELL AVAILABLE!\n" << std::flush; }

serge2016 commented 7 years ago

Yes, you are right! Shell is NOT available:

root@fb598e2f0e52:/outputs# $HLAPRGLA --BAM NA12878.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge1 --maxThreads 4
inferHLATypes.pl

Identified paths:
        samtools_bin: /soft/samtools/bin/samtools
        bwa_bin: /soft/bwa.kit/bwa
        java_bin: /usr/bin/java
        picard_sam2fastq_bin: /soft/picard-tools-1.123/SamToFastq.jar
        General working directory: /outputs
        Sample-specific working directory: /outputs/NA12878_serge1

Extract reads from 534 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "NA12878_GRCH38_ALT_684" on read "ERR091574.7445971" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
        /usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /soft/picard-tools-1.123/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/outputs/NA12878_serge1/extraction.bam F=/outputs/NA12878_serge1/R_1.fastq F2=/outputs/NA12878_serge1/R_2.fastq 2>&1

Now executing:
../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge1 --outputDirectory /outputs/NA12878_serge1 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/NA12878_serge1/R_1.fastq --FASTQ2 /outputs/NA12878_serge1/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 4
 [ Wed Jun 28 11:25:36 2017 ] Graph serialization existing and newer than graph file; read from /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Wed Jun 28 11:30:58 2017 ]   done.
 [ Wed Jun 28 11:31:30 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Wed Jun 28 11:31:38 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
FLAG-FLAG-FLAG! BWA command: /soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/NA12878_serge1/R_1.fastq /outputs/NA12878_serge1/R_2.fastq 2> /outputs/NA12878_serge1/remapped_with_a.bam.log1 | /soft/samtools/bin/samtools view -Sb - > /outputs/NA12878_serge1/remapped_with_a.bam.unsorted
NO SHELL AVAILABLE!
Error code: -1 translation: Unknown error -1
terminate called after throwing an instance of 'std::runtime_error'
  what():  BWA command2 /soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /outputs/NA12878_serge1/R_1.fastq /outputs/NA12878_serge1/R_2.fastq 2> /outputs/NA12878_serge1/remapped_with_a.bam.log1 | /soft/samtools/bin/samtools view -Sb - > /outputs/NA12878_serge1/remapped_with_a.bam.unsorted returned code -1
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge1 --outputDirectory /outputs/NA12878_serge1 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /outputs/NA12878_serge1/R_1.fastq --FASTQ2 /outputs/NA12878_serge1/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
serge2016 commented 7 years ago

I am running all this in Docker container. Now I started it outside:

root@hp-sergetes-taaa-bbbb-cccc-111111111115:/universe/run01# $HLAPRGLA --BAM NA12878.cram --graph PRG_MHC_GRCh38_withIMGT --workingDir /universe/run01/ --sampleID NA12878_serge_outside3 --maxThreads 4
inferHLATypes.pl

Identified paths:
        samtools_bin: /soft/samtools/bin/samtools
        bwa_bin: /soft/bwa.kit/bwa
        java_bin: /usr/bin/java
        picard_sam2fastq_bin: /soft/picard-tools-1.123/SamToFastq.jar
        General working directory: /universe/run01
        Sample-specific working directory: /universe/run01/NA12878_serge_outside3

Extract reads from 534 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "NA12878_GRCH38_ALT_684" on read "ERR091574.7445971" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
        /usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /soft/picard-tools-1.123/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/universe/run01/NA12878_serge_outside3/extraction.bam F=/universe/run01/NA12878_serge_outside3/R_1.fastq F2=/universe/run01/NA12878_serge_outside3/R_2.fastq 2>&1

Now executing:
../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge_outside3 --outputDirectory /universe/run01/NA12878_serge_outside3 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /universe/run01/NA12878_serge_outside3/R_1.fastq --FASTQ2 /universe/run01/NA12878_serge_outside3/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0
Set maxThreads to 4
 [ Wed Jun 28 15:36:52 2017 ] Graph serialization existing and newer than graph file; read from /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Wed Jun 28 15:42:09 2017 ]   done.
 [ Wed Jun 28 15:42:47 2017 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Wed Jun 28 15:42:52 2017 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
Error
NO SHELL AVAILABLE!
Test command 'ls -lah' (as string) returned error code: -1 translation: Unknown error -1
Test command 'ls -lah | tee /universe/run01/NA12878_serge_outside3/remapped_with_a.bam.log00002' (as variable) returned error code: -1 translation: Unknown error -1
FLAG-FLAG-FLAG!
BWA command: /soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /universe/run01/NA12878_serge_outside3/R_1.fastq /universe/run01/NA12878_serge_outside3/R_2.fastq 2> /universe/run01/NA12878_serge_outside3/remapped_with_a.bam.log1 | /soft/samtools/bin/samtools view -Sb - > /universe/run01/NA12878_serge_outside3/remapped_with_a.bam.unsorted
Error code: -1 translation: Unknown error -1
terminate called after throwing an instance of 'std::runtime_error'
  what():  BWA command2 /soft/bwa.kit/bwa mem -t4 -M -a /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/mapping_PRGonly/referenceGenome.fa /universe/run01/NA12878_serge_outside3/R_1.fastq /universe/run01/NA12878_serge_outside3/R_2.fastq 2> /universe/run01/NA12878_serge_outside3/remapped_with_a.bam.log1 | /soft/samtools/bin/samtools view -Sb - > /universe/run01/NA12878_serge_outside3/remapped_with_a.bam.unsorted returned code -1
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID NA12878_serge_outside3 --outputDirectory /universe/run01/NA12878_serge_outside3 --PRG_graph_dir /soft/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /universe/run01/NA12878_serge_outside3/R_1.fastq --FASTQ2 /universe/run01/NA12878_serge_outside3/R_2.fastq --bwa_bin /soft/bwa.kit/bwa --samtools_bin /soft/samtools/bin/samtools --mapAgainstCompleteGenome 0

And it still fails.

The code is:

    if (system(NULL)) puts ("Ok");
    else puts ("Error");
    system("dir");
    if (!std::system( NULL ))
    {
        std::cerr << "NO SHELL AVAILABLE!\n" << std::flush;
    }
    int retCodeSerge1 = std::system("ls -lah | tee /universe/run01/NA12878_serge_outside3/serge.log00001");
    if(retCodeSerge1 != 0)
    {
        std::cerr << "Test command 'ls -lah' (as string) returned error code: " << retCodeSerge1 << " translation: " << std::strerror(retCodeSerge1) << '\n' << std::flush;
//      throw std::runtime_error("Test command 'ls -lah' (as string) returned code "+Utilities::ItoStr(retCodeSerge1));
    }
    else
    {
        std::cerr << "Test command 'ls -lah' (as string) is done!" << '\n' << std::flush;
    }
    std::string serge_str2 = "ls -lah | tee " + outputBAM + ".log00002";
    int retCodeSerge2 = std::system(serge_str2.c_str());
    if(retCodeSerge2 != 0)
    {
        std::cerr << "Test command '" << serge_str2.c_str() << "' (as variable) returned error code: " << retCodeSerge2 << " translation: " << std::strerror(retCodeSerge2) << '\n' << std::flush;
//      throw std::runtime_error("Test command '" + serge_str2 + "' returned code " + Utilities::ItoStr(retCodeSerge2));
    }
    else
    {
        std::cerr << "Test command 'ls -lah' (as variable) is done!" << '\n' << std::flush;
    }
    std::string bwa_mapping_cmd = bwa_bin + " mem -t" + Utilities::ItoStr(threads) + " -M " + with_a + indexedReferenceGenome + " " + FASTQ1 + " " + FASTQ2 + " 2> " + outputBAM + ".log1 | " + samtools_bin + " view -Sb - > " + outputUnsorted;
    std::cerr << "FLAG-FLAG-FLAG!\nBWA command: " << bwa_mapping_cmd.c_str() << "\n" << std::flush;
    int retCode = std::system(bwa_mapping_cmd.c_str());
    if(retCode != 0)
    {
        std::cerr << "Error code: " << retCode << " translation: " << std::strerror(retCode) << '\n' << std::flush;
        throw std::runtime_error("BWA command2 " + bwa_mapping_cmd + " returned code "+Utilities::ItoStr(retCode));
    }
serge2016 commented 7 years ago

I have also tested the system call outside Docker: main.cpp

#include <stdio.h>      /* printf */
#include <stdlib.h>     /* system, NULL, EXIT_FAILURE */

int main ()
{
  int i;
  printf ("Checking if processor is available...");
  if (system(NULL)) puts ("Ok");
    else exit (EXIT_FAILURE);
  printf ("Executing command DIR...\n");
  i=system ("dir");
  printf ("The value returned was: %d.\n",i);
  return 0;
}

Compilation:

root@hp-sergetes-taaa-bbbb-cccc-111111111115:/universe/run01# g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
...
root@hp-sergetes-taaa-bbbb-cccc-111111111115:/universe/run01# g++ main.cpp

Test run:

root@hp-sergetes-taaa-bbbb-cccc-111111111115:/universe/run01# ./a.out
Checking if processor is available...Ok
Executing command DIR...
a.out     NA12878.cram       NA12878_serge1  NA12878_serge_outside1              pat126-normal-exome-SRR2780399.bam.bai  pat126-tumor-exome-SRR2772336.bam.bai
main.cpp  NA12878.cram.crai  NA12878_serge2  pat126-normal-exome-SRR2780399.bam  pat126-tumor-exome-SRR2772336.bam
The value returned was: 0.

Inside Docker the result is the same.

serge2016 commented 7 years ago

The problem is that in your code the stdout of the command, called inside system() is hidden! I can't find it. And in the sample main.cpp the stdout is transferred to the output (stdout)!

Valeriia-arh commented 7 years ago

Good afternoon, Alexander! I have done the actions as serge2016 and have the same mistakes.I can`t run test I attach my Dockerfile

Dockerfile.txt

May be there is some solution to this problem?

AlexanderDilthey commented 7 years ago

@serge2016 and @Valeriia-arh: I think it is important to solve these Docker problems, but I have no Docker experience and we don't have Docker installed on our infrastructure. I really appreciate your help and feedback, let's try to sort this out!

@serge2016 So normal calls to system() execute within your Docker environment, but when HLA*PRG:LA tries to open the shell, it doesn't work? This is weird! The STDOUT is redirected because this is raw SAM data, and this then goes into samtools for encoding into BAM, and then it is stored in a file. You could try removing the redirection for testing...

andrewrech commented 7 years ago

@serge2016 @Valeriia-arh To echo what @AlexanderDilthey has said, while it would be great to see this set of tools deployable with Docker, it would be easier to diagnose these issues if you could first install according to the documentation without this added, complex abstraction layer.

Santy-8128 commented 5 years ago

Hi, has there been any solution to this. I am having the same problem. I don't think the problem has anything to do with docker because I get the same error outside docker. These errors are happening on AWS EC2 for me. Some samples are giving this error while some aren't. Any solution would be really appreciated.

AlexanderDilthey commented 5 years ago

There are now Docker containers provided by the community: https://hub.docker.com/r/cmopipeline/hlala

serge2016 commented 5 years ago

Dear @AlexanderDilthey, could you provide the Dockerfile, please? It would be great to store it inside the repo!