Oshlack / JAFFA

JAFFA is a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
https://github.com/Oshlack/JAFFA/wiki
Other
86 stars 21 forks source link

Reads with incorrect qualities error #77

Closed evanwangx closed 2 years ago

evanwangx commented 2 years ago

Hello,

I'm trying to run JAFFAL on my institute's HPC using Singularity. I've converted your docker image (https://hub.docker.com/r/beccyl/jaffa) into a Singularity container and run this with both short and long read demo data on the cluster.

The problem is when I try running my own Nanopore data, I get an incorrect qualities error when converting the fastq to fasta. You've provided a few possible solutions to this (https://github.com/Oshlack/JAFFA/issues/51#issuecomment-654649777), of which [ignorebadquality=t] is included in the newest JAFFA version, but I don't have write permission to write this into the JAFFAL.groovy file in the Singularity container.

Would you be able to provide the newest update in a Docker or Singularity container I can try?

As well, is there anything wrong with my Nanopore fastqs that might be causing this issue?

I've included the run errors for my 2 fastqs below.

Thanks!


singularity exec jaffalatest.sif bpipe run -p fastqInputFormat='%*.fastq.gz' -p refBase=./ -p genome=hg38 -p annotation=genCode22 /opt/JAFFA/JAFFAL.groovy ./GSP3_PAH_Q7.fastq.gz

| Starting Pipeline at 2022-04-20 22:44 |

========================================= Stage run_check ========================================== Running JAFFA version 2.0 Checking for required data files... .//hg38_genCode22.fa .//hg38_genCode22.tab /opt/JAFFA/known_fusions.txt .//hg38.fa .//Masked_hg38.1.bt2 .//hg38_genCode22.1.bt2 All looking good

====================================== Stage get_fasta (GSP3) ====================================== java -ea -Xms300m -cp /opt/bbmap/current/ jgi.ReformatReads in=./GSP3_PAH_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 Executing jgi.ReformatReads [in=./GSP3_PAH_Q7.fastq.gz, out=GSP3/GSP3.fasta, threads=16]

Set threads to 16 Input is being processed as unpaired Warning! Changed from ASCII-33 to ASCII-64 on input Z: 90 -> 59 Up to 4 prior reads may have been generated with incorrect qualities. If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.

The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5. Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'. Problematic read number 4:

@44:126|cfeefaf6-5cb8-4f55-bb23-e5d3708e74df runid=a3f51e7502c488f649695d3ee76ea92918e86778 read=6 ch=556 start_time=2022-02-10T19:28:28.193435-05:00 flow_cell_id=PAH66580 protocol_group_id=Zhenbao sample_id=DAOY_20221002 parent_read_id=cfeefaf6-5cb8-4f55-bb23-e5d3708e74df strand=+ GGAAACGACGTCACGTCCGGCGCGGAGACGGTGGAGTCTCCCGCCGCTGTAACGGCGGGTACGCGTAGTGGAAGCGCACTAG + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Offset=64 java.lang.Exception: Aborting. at shared.KillSwitch.kill(KillSwitch.java:108) at stream.FASTQ.quadToRead_slow(FASTQ.java:767) at stream.FASTQ.toReadList(FASTQ.java:659) at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107) at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73) at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667) at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656) Cleaned up file GSP3/GSP3.fasta to .bpipe/trash/GSP3.fasta ERROR: stage get_fasta failed: Command in stage get_fasta failed with exit status = 1 :

reformat.sh in=./GSP3_PAH_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 ;

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

--------------------------------------- get_fasta ( GSP3 ) ---------------------------------------

Command in stage get_fasta failed with exit status = 1 :

reformat.sh in=./GSP3_PAH_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 ;


Use 'bpipe errors' to see output from failed commands.


singularity exec jaffalatest.sif bpipe run -p fastqInputFormat='%*.fastq.gz' -p refBase=./ -p genome=hg38 -p annotation=genCode22 /opt/JAFFA/JAFFAL.groovy ./GSP3_Hind3_Q7.fastq.gz

| Starting Pipeline at 2022-04-20 23:01 |

========================================= Stage run_check ========================================== Running JAFFA version 2.0 Checking for required data files... .//hg38_genCode22.fa .//hg38_genCode22.tab /opt/JAFFA/known_fusions.txt .//hg38.fa .//Masked_hg38.1.bt2 .//hg38_genCode22.1.bt2 All looking good

====================================== Stage get_fasta (GSP3) ====================================== java -ea -Xms300m -cp /opt/bbmap/current/ jgi.ReformatReads in=./GSP3_Hind3_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 Executing jgi.ReformatReads [in=./GSP3_Hind3_Q7.fastq.gz, out=GSP3/GSP3.fasta, threads=16]

Set threads to 16 Input is being processed as unpaired Warning! Changed from ASCII-33 to ASCII-64 on input Z: 90 -> 59 Up to 3 prior reads may have been generated with incorrect qualities. If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.

The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5. Please re-run with the flag 'qin=33', 'ignorebadquality', or '-da'. Problematic read number 3:

@40:216|08456989-4575-4d8d-b66d-13c3361fa6c1 runid=8c72fcbb0a8d1362d56109c933d5d06e767c3a46 read=6 ch=1835 start_time=2022-02-10T19:20:06.359369-05:00 flow_cell_id=PAI40056 protocol_group_id=Zhenbao sample_id=HindIII_20220210 parent_read_id=08456989-4575-4d8d-b66d-13c3361fa6c1 strand=+ GGGGGGGCAGCCCGGCCGCGCCCGCCGCCGCCGCCGCCGCCATGGGCTGCCTCGGGAACAGTAAGACCGAGGACCAGCGCAACGAGGAGAAGGCGCAGCGTGAGGTAGCAAAAGATCGAGAAGCAGCTGGGAAGGACAAGCAGGTCTGCAGACCACGCACCGCCTGCTGCTGCTGG + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@AA@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@A@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Offset=64 java.lang.Exception: Aborting. at shared.KillSwitch.kill(KillSwitch.java:108) at stream.FASTQ.quadToRead_slow(FASTQ.java:767) at stream.FASTQ.toReadList(FASTQ.java:659) at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107) at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73) at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667) at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656) Cleaned up file GSP3/GSP3.fasta to .bpipe/trash/GSP3.fasta ERROR: stage get_fasta failed: Command in stage get_fasta failed with exit status = 1 :

reformat.sh in=./GSP3_Hind3_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 ;

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

--------------------------------------- get_fasta ( GSP3 ) ---------------------------------------

Command in stage get_fasta failed with exit status = 1 :

reformat.sh in=./GSP3_Hind3_Q7.fastq.gz out=GSP3/GSP3.fasta threads=16 ;


Use 'bpipe errors' to see output from failed commands.

nadiadavidson commented 2 years ago

Hi, I am unlikely to be able to update the Doxer container any time soon, however JAFFA is happy to take a .fasta file as it doesn't really use quality scores. So if you can convert your data to .fasta format through other means e.g. manually running reformat.sh in=./GSP3_Hind3_Q7.fastq.gz out=GSP3/GSP3.fasta with ignorebadquality.

Cheers, Nadia.

evanwangx commented 2 years ago

Hi Nadia, thanks for your suggestion. I converted the fastq to fasta using bbmap reformat.sh ignorebadquality=T qin=33. I then reran JAFFAL separately on my samples. The pipeline worked for one of my samples, but I'm getting an error for the other one. I've copied the relevant section below. Thanks!


============================ Stage make_fasta_reads_table (GSP3_PAH_Q7) ============================ sort: cannot create temporary file in '/localhd/tmp': No such file or directory Cleaned up file GSP3_PAH_Q7/GSP3_PAH_Q7.reads to .bpipe/trash/GSP3_PAH_Q7.reads ERROR: stage make_fasta_reads_table failed: Command in stage make_fasta_reads_table failed with exit status = 2 :

echo -e "transcript break_min break_max fusion_genes spanning_pairs spanning_reads" > GSP3_PAH_Q7/GSP3_PAH_Q7.reads ; awk '{ print $1" "$2" "$3" "$4" "0" "1}' GSP3_PAH_Q7/GSP3_PAH_Q7.txt | sort -u >> GSP3_PAH_Q7/GSP3_PAH_Q7.reads

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

----------------------------- make_fasta_reads_table ( GSP3_PAH_Q7 ) -----------------------------

Command in stage make_fasta_reads_table failed with exit status = 2 :

echo -e "transcript break_min break_max fusion_genes spanning_pairs spanning_reads" > GSP3_PAH_Q7/GSP3_PAH_Q7.reads ; awk '{ print $1" "$2" "$3" "$4" "0" "1}' GSP3_PAH_Q7/GSP3_PAH_Q7.txt | sort -u >> GSP3_PAH_Q7/GSP3_PAH_Q7.reads


singularity exec jaffa_latest.sif bpipe errors

============================== Found 1 failed commands from run 9298 ===============================

=============================== Command make_fasta_reads_table (19) ================================

Command : echo -e "transcript break_min break_max fusion_genes spanning_pairs spanning_reads" > GSP3_PAH_Q7/GSP3_PAH_Q7.reads ; awk '{ print $1" "$2" "$3" "$4" "0" "1}' GSP3_PAH_Q7/GSP3_PAH_Q7.txt | sort -u >> GSP3_PAH_Q7/GSP3_PAH_Q7.reads Started : Thu Apr 21 06:17:38 UTC 2022 Stopped : Thu Apr 21 06:17:38 UTC 2022 Exit Code : 2 Config: Name | Value

      max_per_command_threads | 16                    
      executor                | local                 
      stats_update_interval   | 120000                
      outputScanConcurrency   | 5                     
      maxFileNameLength       | 2048                  
      name                    | make_fasta_reads_table
      procs                   | 1                     

Output :

sort: cannot create temporary file in '/localhd/tmp': No such file or directory
evanwangx commented 2 years ago

Just wanted to update this.

I didn't manage to solve this problem. I wonder if it has something to do with the Singularity build? I ended up ditching Singularity and installing JAFFA the regular way. This specific error didn't show up again.

singularity exec jaffa_latest.sif bpipe errors

============================== Found 1 failed commands from run 9298 ===============================

=============================== Command make_fasta_reads_table (19) ================================

Command : echo -e "transcript break_min break_max fusion_genes spanning_pairs spanning_reads" > GSP3_PAH_Q7/GSP3_PAH_Q7.reads ; awk '{ print $1" "$2" "$3" "$4" "0" "1}' GSP3_PAH_Q7/GSP3_PAH_Q7.txt | sort -u >> GSP3_PAH_Q7/GSP3_PAH_Q7.reads Started : Thu Apr 21 06:17:38 UTC 2022 Stopped : Thu Apr 21 06:17:38 UTC 2022 Exit Code : 2 Config: Name | Value ------------------------------------------------ max_per_command_threads | 16 executor | local stats_update_interval | 120000 outputScanConcurrency | 5 maxFileNameLength | 2048 name | make_fasta_reads_table procs | 1

Output :

sort: cannot create temporary file in '/localhd/tmp': No such file or directory
nadiadavidson commented 2 years ago

Thanks for the update. I will close the issue now. If anyone else sorts out this issue or gets JAFFA to run with docker they are welcome to post and submit code to the githhub repository.