RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
113 stars 49 forks source link

Running paired data results in empty fastq #118

Closed blain1995 closed 7 months ago

blain1995 commented 7 months ago

Hi,

I have aligned my fastqs using STAR and gencode reference release 45. I am attempting to run arcasHLA using a docker image built from the dockerfile you kindly made available.

My data is paired end data, and I have found that similar to #5 I ended up with errors due to no reads aligned, however, unlike #5 I have the latest version of samtools installed. I followed the advice given by @roseorenbuch on #5 by running the following code:

samtools view -H -@1 test/test.bam -o /tmp/test.hla.sam samtools view -@1 -f 2 test/test.bam 6 >> /tmp/test.hla.sam samtools view -Sb -@1 /tmp/test.hla.sam > /tmp/test.hla.bam samtools sort -n -@1 /tmp/test.hla.bam -o /tmp/test.hla.sorted.bam bedtools bamtofastq -i /tmp/test.hla.sorted.bam -fq test/output/test.extracted.1.fq -fq2 test/output/test.extracted.2.fq pigz -f -p 1 -S .gz test/output/test.extracted.1.fq pigz -f -p 1 -S .gz test/output/test.extracted.2.fq

Firstly, I found that line two errored with "[main_samview] region "6" specifies an unknown reference name. Continue anyway." but ran when I used "chr6" instead. However, even though this line ran the test.hla.sam file was unchanged. After looking through the samtools documentation I could not find a reason why this would be the case. Interestingly, I noticed that @Tinocapo briefly described a similar issue ( #110 ) that appeared to resolve when the --single parameter was applied, even though their data was also paired-end. I tried the arcasHLA extract in single end mode and found that it was writing data to a fastq. As I had ran this verbose I noticed the samtools view argument was different for single end consisting of:

samtools view -@1 -F 4 test/test.bam chr6 >> /tmp/test.hla.sam

My current plan is to run line by line exchanging the second line but ensuring I still output to two fastq files at the end. Could someone please explain this to me? Unsure if I am doing something silly, but after extensive reading I cannot seem to identify the source of the issue.

Thank you in advance for your help.

blain1995 commented 7 months ago

Realised I had made an error in making my original BAM files. Updating to prevent others from having the same issue. The problem lay within my STAR script as I had used the following:

--readFilesIn <(gunzip -c file1.fq.gz file2.fq.gz)

instead of:

--readFilesIn <(gunzip -c file1.fq.gz) <(gunzip -c file2.fq.gz)

meaning STAR was treating my files as one single ended file instead of two paired end files.