XSLiuLab / Seq2Neo

Seq2Neo: a comprehensive pipeline for cancer neoantigen immunogenicity prediction
Academic Free License v3.0
21 stars 2 forks source link

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found #3

Closed KunFang93 closed 1 year ago

KunFang93 commented 1 year ago

Hi,

I would like to report an error:

Using GATK jar /home/seq2neo/miniconda3/envs/Seq2Neo/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx40G -jar /home/seq2neo/miniconda3/envs/Seq2Neo/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar BaseRecalibrator --known-sites /home/resource_files/bqsr_resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /home/resource_files/bqsr_resource/1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites /home/resource_files/bqsr_resource/dbsnp_146.hg38.vcf.gz -R /home/resource_files/ref_genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa -I /home/resource_files/results/test/tmp/normal_marked.bam -O /home/resource_files/results/test/tmp/normal_BQSR.bam.recal_data.table
21:51:01.000 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/seq2neo/miniconda3/envs/Seq2Neo/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 15, 2022 9:51:01 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
21:51:01.185 INFO  BaseRecalibrator - ------------------------------------------------------------
21:51:01.185 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.5.0
21:51:01.185 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
21:51:01.186 INFO  BaseRecalibrator - Executing as root@9ac92c70ddc8 on Linux v3.10.0-1062.1.1.el7.x86_64 amd64
21:51:01.186 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1-internal+0-adhoc..src
21:51:01.186 INFO  BaseRecalibrator - Start Date/Time: November 15, 2022 at 9:51:00 PM UTC
21:51:01.186 INFO  BaseRecalibrator - ------------------------------------------------------------
21:51:01.186 INFO  BaseRecalibrator - ------------------------------------------------------------
21:51:01.187 INFO  BaseRecalibrator - HTSJDK Version: 2.24.1
21:51:01.187 INFO  BaseRecalibrator - Picard Version: 2.25.4
21:51:01.187 INFO  BaseRecalibrator - Built for Spark Version: 2.4.5
21:51:01.187 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:51:01.187 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:51:01.187 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:51:01.187 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:51:01.187 INFO  BaseRecalibrator - Deflater: IntelDeflater
21:51:01.187 INFO  BaseRecalibrator - Inflater: IntelInflater
21:51:01.187 INFO  BaseRecalibrator - GCS max retries/reopens: 20
21:51:01.187 INFO  BaseRecalibrator - Requester pays: disabled
21:51:01.187 INFO  BaseRecalibrator - Initializing engine
21:51:01.441 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/resource_files/bqsr_resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
21:51:01.764 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/resource_files/bqsr_resource/1000G_phase1.snps.high_confidence.hg38.vcf.gz
21:51:01.977 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/resource_files/bqsr_resource/dbsnp_146.hg38.vcf.gz
21:51:02.138 WARN  IndexUtils - Feature file "file:///home/resource_files/bqsr_resource/dbsnp_146.hg38.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
21:51:02.220 WARN  IntelInflater - Zero Bytes Written : 0
21:51:02.229 INFO  BaseRecalibrator - Shutting down engine
[November 15, 2022 at 9:51:02 PM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=3305111552
***********************************************************************

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
......

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Traceback (most recent call last):
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/bin/seq2neo", line 33, in <module>
    sys.exit(load_entry_point('Seq2Neo==1.1', 'console_scripts', 'seq2neo')())
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/lib/python3.7/site-packages/seq2neo/main.py", line 13, in main
    args[0].func.main(args[1])
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/lib/python3.7/site-packages/seq2neo/model/whole.py", line 23, in main
    toBAM_dna_normal(args, tmpPATH, resultsPATH)
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/lib/python3.7/site-packages/seq2neo/lib/toBAM.py", line 42, in toBAM_dna_normal
    normal_bqsr_cmd.baserecalibrator()
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/lib/python3.7/site-packages/seq2neo/function/GATK_Best_Practice/_bqsr.py", line 28, in baserecalibrator
    sp.check_call(baserecalibrator_cmd, shell=True, stdout=sp.DEVNULL)
  File "/home/seq2neo/miniconda3/envs/Seq2Neo/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'gatk BaseRecalibrator --known-sites /home/resource_files/bqsr_resource/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /home/resource_files/bqsr_resource/1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites /home/resource_files/bqsr_resource/dbsnp_146.hg38.vcf.gz -R /home/resource_files/ref_genome/Homo_sapiens.GRCh38.dna.primary_assembly.fa -I /home/resource_files/results/test/tmp/normal_marked.bam -O /home/resource_files/results/test/tmp/normal_BQSR.bam.recal_data.table --java-options "-Xmx40G" 1>/dev/null' returned non-zero exit status 2.
****

Kun

KunFang93 commented 1 year ago

The error is caused by unmatched chromosome name between bqsr_resource and ref_genome.

My solution:

wget https://github.com/broadinstitute/gatk/raw/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz
bwa index -a bwtsw Homo_sapiens_assembly38.fasta
samtools faidx Homo_sapiens_assembly38.fasta
gatk CreateSequenceDictionary -R Homo_sapiens_assembly38.fasta -O Homo_sapiens_assembly38.dict
diaokx commented 1 year ago

Thanks for the suggestion. GATK feature files and Ensembl reference genome are mismatch. I will mention it in github readme.