CRG-CNAG / CalliNGS-NF

GATK RNA-Seq Variant Calling in Nextflow
Mozilla Public License 2.0
130 stars 53 forks source link

Error in SplitNCigarReads step #5

Closed slagtermaarten closed 5 years ago

slagtermaarten commented 5 years ago

Hi

Thanks for developing and (maintaining?) this pipeline! I tried to run it but ran into some issues . Do you have any ideas?

ERROR ~ Error executing process > '3_rnaseq_gatk_splitNcigar (S31)'

Caused by:
  Process `3_rnaseq_gatk_splitNcigar (S31)` terminated with an error exit status (1)

Command executed:

  # SplitNCigarReads and reassign mapping qualities
  java -jar /DATA/resources/gatk/GATK-3.7/GenomeAnalysisTK.jar -T SplitNCigarReads           -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam           -o split.bam           -rf ReassignOneMappingQuality           -RMQF 255 -RMQT 60           -U ALLOW_N_CIGAR_READS           --fix_misencoded_quality_scores

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO  01:01:07,799 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,801 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
  INFO  01:01:07,801 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
  INFO  01:01:07,802 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
  INFO  01:01:07,802 HelpFormatter - [Wed Mar 06 01:01:07 CET 2019] Executing on Linux 4.4.0-142-generic amd64
  INFO  01:01:07,802 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12
  INFO  01:01:07,806 HelpFormatter - Program Args: -T SplitNCigarReads -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores
  INFO  01:01:07,813 HelpFormatter - Executing as m.slagter@coley on Linux 4.4.0-142-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12.
  INFO  01:01:07,813 HelpFormatter - Date/Time: 2019/03/06 01:01:07
  INFO  01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,889 GenomeAnalysisEngine - Strictness is SILENT
  INFO  01:01:08,231 GenomeAnalysisEngine - Downsampling Settings: No downsampling
  INFO  01:01:08,241 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
  INFO  01:01:08,286 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
  INFO  01:01:08,537 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
  INFO  01:01:08,545 GenomeAnalysisEngine - Done preparing for traversal
  INFO  01:01:08,546 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
  INFO  01:01:08,546 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
  INFO  01:01:08,547 ProgressMeter -        Location |     reads | elapsed |     reads | completed | runtime |   runtime
  INFO  01:01:08,572 ReadShardBalancer$1 - Loading BAM index data
  INFO  01:01:08,574 ReadShardBalancer$1 - Done loading BAM index data
  ##### ERROR ------------------------------------------------------------------------------------------
  ##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67):
  ##### ERROR
  ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
  ##### ERROR The error message below tells you what is the problem.
  ##### ERROR
  ##### ERROR If the problem is an invalid argument, please check the online documentation guide
  ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
  ##### ERROR
  ##### ERROR Visit our website and forum for extensive documentation and answers to
  ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
  ##### ERROR
  ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
  ##### ERROR
  ##### ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
  ##### ERROR ------------------------------------------------------------------------------------------
slagtermaarten commented 5 years ago

So I found out how to get rid of this message here.

Adapting the nextflow script, removing the --fix_misencoded_quality_scores flag seems to do the trick.

I cannot exclude that this 'bug' was introduced by my use of slightly different versions of the required programs. I tried to run the Docker image but couldn't run nextflow in there. I then opted for a local install (without ensuring I had exactly the same versions of the dependencies as you've used) but eventually ran into the issues detailed here.

pditommaso commented 5 years ago

This is definitely a GATK related issue. The pipeline is provided as template for the implementation of a var-calling data analysis, but it's not meant to be production quality.

beginner984 commented 5 years ago

Sorry this command for me returns empty .bam file but why ?

java -jar $GATKjar -T SplitNCigarReads -R ./hs37d5.fa -I ./dupmarked.bam -o ./dupmarked_output.bam -U ALLOW_N_CIGAR_READS

Any suggestion please?

Thanks a lot