I recently successfully processed 1848 of 1857 WGS files obtained via dbGaP. Four of the files that did not process returned the error:
"You didn't activate --longReads, but the two files ... (which store paired-end reads) are empty - this is weird, and I will abort. at /opt2/hla-la/1.0.3/HLA-LA/src/HLA-LA.pl line 513."
Several repeats, and a repeat with a 'fresh' download did not clear this error.
Can you give any insight into what might be wrong in these sequence files? Below is the readout text from one of the processes that failed:
Identified paths:
samtools_bin: /usr/bin/samtools
bwa_bin: /usr/bin/bwa
java_bin: /usr/bin/java
picard_sam2fastq_bin: /usr/bin/picard-tools
General working directory: /home/sara_contente/HLA-LA/working
Sample-specific working directory: /home/sara_contente/HLA-LA/working/NWD365424
Using /home/sara_contente/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences/1000G_B38.txt as reference file.
Extract reads from 534 regions... Extract unmapped reads...
Merging...
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L001" on read "E00170:255:HL5GLCCXX:1:1101:1012:58233" encountered with no corresponding entry in header, tag lost. Unknown tags are on
ly reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L002" on read "E00170:255:HL5GLCCXX:2:1101:1012:48810" encountered with no corresponding entry in header, tag lost. Unknown tags are on
ly reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L003" on read "E00170:255:HL5GLCCXX:3:1101:991:28839" encountered with no corresponding entry in header, tag lost. Unknown tags are onl
y reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L004" on read "E00170:255:HL5GLCCXX:4:1101:1083:18168" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L005" on read "E00170:255:HL5GLCCXX:5:1101:991:33059" encountered with no corresponding entry in header, tag lost. Unknown tags are onl
y reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L006" on read "E00170:255:HL5GLCCXX:6:1101:991:40829" encountered with no corresponding entry in header, tag lost. Unknown tags are onl
y reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L007" on read "E00170:255:HL5GLCCXX:7:1101:991:28031" encountered with no corresponding entry in header, tag lost. Unknown tags are onl
y reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L008" on read "E00170:255:HL5GLCCXX:8:1101:1002:14793" encountered with no corresponding entry in header, tag lost. Unknown tags are on
ly reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
/usr/bin/picard-tools SamToFastq VALIDATION_STRINGENCY=LENIENT I=/home/sara_contente/HLA-LA/working/NWD365424/extraction.bam F=/home/sara_contente/HLA-LA/working/NWD365424/R_1.fast
q F2=/home/sara_contente/HLA-LA/working/NWD365424/R_2.fastq FU=/home/sara_contente/HLA-LA/working/NWD365424/R_U.fastq 2>&1
You didn't activate --longReads, but the two files /home/sara_contente/HLA-LA/working/NWD365424/R_1.fastq and /home/sara_contente/HLA-LA/working/NWD365424/R_2.fastq (which store paired-end
reads) are empty - this is weird, and I will abort. at /home/sara_contente/HLA-LA/src/HLA-LA.pl line 513.
I recently successfully processed 1848 of 1857 WGS files obtained via dbGaP. Four of the files that did not process returned the error: "You didn't activate --longReads, but the two files ... (which store paired-end reads) are empty - this is weird, and I will abort. at /opt2/hla-la/1.0.3/HLA-LA/src/HLA-LA.pl line 513."
Several repeats, and a repeat with a 'fresh' download did not clear this error.
Can you give any insight into what might be wrong in these sequence files? Below is the readout text from one of the processes that failed:
Identified paths:
samtools_bin: /usr/bin/samtools
bwa_bin: /usr/bin/bwa
java_bin: /usr/bin/java
picard_sam2fastq_bin: /usr/bin/picard-tools General working directory: /home/sara_contente/HLA-LA/working
Sample-specific working directory: /home/sara_contente/HLA-LA/working/NWD365424 Using /home/sara_contente/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences/1000G_B38.txt as reference file. Extract reads from 534 regions... Extract unmapped reads...
Merging... [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L001" on read "E00170:255:HL5GLCCXX:1:1101:1012:58233" encountered with no corresponding entry in header, tag lost. Unknown tags are on ly reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L002" on read "E00170:255:HL5GLCCXX:2:1101:1012:48810" encountered with no corresponding entry in header, tag lost. Unknown tags are on ly reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L003" on read "E00170:255:HL5GLCCXX:3:1101:991:28839" encountered with no corresponding entry in header, tag lost. Unknown tags are onl y reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L004" on read "E00170:255:HL5GLCCXX:4:1101:1083:18168" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L005" on read "E00170:255:HL5GLCCXX:5:1101:991:33059" encountered with no corresponding entry in header, tag lost. Unknown tags are onl y reported once per input file for each tag ID.
[bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L006" on read "E00170:255:HL5GLCCXX:6:1101:991:40829" encountered with no corresponding entry in header, tag lost. Unknown tags are onl y reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L007" on read "E00170:255:HL5GLCCXX:7:1101:991:28031" encountered with no corresponding entry in header, tag lost. Unknown tags are onl y reported once per input file for each tag ID. [bam_translate] RG tag "NWD365424_CGCTCATT_HL5GLCCXX_L008" on read "E00170:255:HL5GLCCXX:8:1101:1002:14793" encountered with no corresponding entry in header, tag lost. Unknown tags are on ly reported once per input file for each tag ID. Indexing... Extract FASTQ... /usr/bin/picard-tools SamToFastq VALIDATION_STRINGENCY=LENIENT I=/home/sara_contente/HLA-LA/working/NWD365424/extraction.bam F=/home/sara_contente/HLA-LA/working/NWD365424/R_1.fast q F2=/home/sara_contente/HLA-LA/working/NWD365424/R_2.fastq FU=/home/sara_contente/HLA-LA/working/NWD365424/R_U.fastq 2>&1 You didn't activate --longReads, but the two files /home/sara_contente/HLA-LA/working/NWD365424/R_1.fastq and /home/sara_contente/HLA-LA/working/NWD365424/R_2.fastq (which store paired-end reads) are empty - this is weird, and I will abort. at /home/sara_contente/HLA-LA/src/HLA-LA.pl line 513.