bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
194 stars 53 forks source link

UnboundLocalError: local variable 'normal_name' referenced before assignment #103

Closed zhangbiwu closed 2 years ago

zhangbiwu commented 3 years ago

hi,I get a issue:

/somaticseq/vcfModifier/modify_MuTect2.py", line 64, in convert normal_index = header.index(normal_name) - 9 UnboundLocalError: local variable 'normal_name' referenced before assignment

Can you help me deal with this problem, thanks

litaifang commented 3 years ago

Did you run MuTect2 from GATK3? A few people have raised this issue before. MuTect2 from GATK4 (the supported version) has headers like ##normal_sample=NORMAL_NAME and ##tumor_sample=TUMOR_NAME, but the ones from GATK3 does not. If the error was due to GATK3, a simple fix is to simply add those two lines in the headers of the GATK3 MuTect2's VCF output.

If you were running GATK3's MuTect2, I would recommend upgrading to GATK4, which is much faster.

zhangbiwu commented 3 years ago

I run the dockerized somatic mutation callers and The following is the running code:

makeSomaticScripts.py \ paired \ --output-directory paired_example \ --tumor-bam tumor/sim7.snv.bam \ --normal-bam ../normal/sim15.normal.bam \ --genome-reference ../ref_fasta/hg19.fasta \ --truth-snv labels/sim7.snv.True_VCF \ --dbsnp-vcf ../dpsnp/dbsnp_150.hg19.vcf \ --run-mutect2 --run-vardict --run-strelka2 --run-somaticseq --train-somaticseq -nt 2 --run-workflow

thanks

litaifang commented 3 years ago

Check the headers of tumor/sim7.snv.bam and normal/sim15.normal.bam, and see if they have different sample names, e.g., samtools view -H tumor/sim7.snv.bam. There should be a header line like @RG ID:ReadGroupID SM:SampleName LB:LibraryPrep PL:ILLUMINA. The "SampleName" must be different for the tumor and normal bam files. That is a requirement for MuTect2.

zhangbiwu commented 3 years ago

thanks, when I check the headers of tumor/sim7.snv.bam,I get the info: @RG ID:69a29d45-8b8e-487e-8709-f3b074a7e3a2 CN:BS SM:sim7.snv.sort LB:bamsurgeon PL:COMPLETE @PG PN:bamsurgeon ID:bamsurgeon

when I check the headers of normal/sim15.normal.bam,I get the info: @RG ID:7408b2ee-2e07-4094-ae13-efbed0ad1bf2 CN:BS SM:sim15.normal LB:bamsurgeon PL:COMPLETE @PG PN:bamsurgeon ID:bamsurgeon @PG ID:samtools PN:samtools PP:bamsurgeon VN:1.10 CL:samtools view -H ../normal/sim15.normal.bam

So, The "SampleName" is different.

litaifang commented 3 years ago

Can you send me mutect2's vcf file? I'll take a look what's wrong.

zhangbiwu commented 3 years ago

thanks, I check the MuTect2.vcf file and find The info of the file is none.

litaifang commented 3 years ago

In your bam file headers, are the @rg small case or actually @RG? Separated by tabs right? Make sure MuTect2 runs on the pair of bam files. Try executing a mutect2.*.cmd script in one of those logs directories and see what happens.

zhangbiwu commented 3 years ago

Yes, are the @RG. I executing a mutect2.*.cmd and get these info:

Start at 2021/12/01 16:39:10 08:39:16.849 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/libs/gatk-package-4.0.5.2-local.jar!/com/intel/gkl/native/libgkl_compression.so 08:39:17.031 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.031 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.0.5.2 08:39:17.031 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/ 08:39:17.032 INFO Mutect2 - Executing as ?@4dececb90b2f on Linux v5.8.0-36-generic amd64 08:39:17.032 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11 08:39:17.032 INFO Mutect2 - Start Date/Time: December 1, 2021 8:39:16 AM UTC 08:39:17.032 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.032 INFO Mutect2 - ------------------------------------------------------------ 08:39:17.033 INFO Mutect2 - HTSJDK Version: 2.16.0 08:39:17.033 INFO Mutect2 - Picard Version: 2.18.7 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 08:39:17.033 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 08:39:17.033 INFO Mutect2 - Deflater: IntelDeflater 08:39:17.033 INFO Mutect2 - Inflater: IntelInflater 08:39:17.033 INFO Mutect2 - GCS max retries/reopens: 20 08:39:17.033 INFO Mutect2 - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 08:39:17.033 INFO Mutect2 - Initializing engine 08:39:17.065 INFO Mutect2 - Shutting down engine [December 1, 2021 8:39:17 AM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=510132224


A USER ERROR has occurred: Fasta dict file file:///a61c4b2f178b41dc8bce9b7b7eff8f0d/hg19.dict for reference file:///a61c4b2f178b41dc8bce9b7b7eff8f0d/hg19.fasta does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

litaifang commented 2 years ago

MuTect2 requires .dict file. Make it using picard: https://gatk.broadinstitute.org/hc/en-us/articles/360036729911-CreateSequenceDictionary-Picard-

litaifang commented 2 years ago

The small case @rg could also be an issue. If that gives you problem, you can save the header as a text file using samtools view -H, and then convert @rg to @RG, and then use samtools reheader to change the headers.

zhangbiwu commented 2 years ago

thanks, I will try it

zhangbiwu commented 2 years ago

Hi, this problem has not been resolved. The following is the running information

Start at 2021/12/02 17:57:38 INFO 2021-12-02 09:57:41,189 SomaticSeq SomaticSeq Input Arguments: output_directory=/ee7008ee88274839a7a1fc588c1c7d3e/2/SomaticSeq, genome_reference=/b9c990ead1ec4234ade4263b3a88efcb/hg19.fasta, truth_snv=/b51f64272bed4e1986ec80a404c7f61f/sim7.snv.True_VCF, truth_indel=None, classifier_snv=None, classifier_indel=None, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=None, cosmic_vcf=None, inclusion_region=/349982c0d3fc45a78ab23eaf6721355f/2.bed, exclusion_region=None, threads=1, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], keep_intermediates=False, tumor_bam_file=/262b1fea0f8545d2be2fb06a1e9d1631/sim7.snv.bam, normal_bam_file=/f82f61aca55f46ac8867c7e4514b7362/sim15.normal.bam, tumor_sample=TUMOR, normal_sample=NORMAL, mutect_vcf=None, indelocator_vcf=None, mutect2_vcf=/ee7008ee88274839a7a1fc588c1c7d3e/2/MuTect2.vcf, varscan_snv=None, varscan_indel=None, jsm_vcf=None, somaticsniper_vcf=None, vardict_vcf=/ee7008ee88274839a7a1fc588c1c7d3e/2/VarDict.vcf, muse_vcf=None, lofreq_snv=None, lofreq_indel=None, scalpel_vcf=None, strelka_snv=/ee7008ee88274839a7a1fc588c1c7d3e/2/Strelka/results/variants/somatic.snvs.vcf.gz, strelka_indel=/ee7008ee88274839a7a1fc588c1c7d3e/2/Strelka/results/variants/somatic.indels.vcf.gz, tnscope_vcf=None, platypus_vcf=None, which=paired INFO 2021-12-02 09:57:41,189 SomaticSeq SomaticSeq Input Arguments: output_directory=/bd7a3b6e30bd4590aae11c6b80ed265e/1/SomaticSeq, genome_reference=/403b623c096145d0b0e494e3ecdc0a54/hg19.fasta, truth_snv=/d0e4ecdf8de14d84bb66be656d74044a/sim7.snv.True_VCF, truth_indel=None, classifier_snv=None, classifier_indel=None, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=None, cosmic_vcf=None, inclusion_region=/74e62e0f44844fdf97041bde7f29affc/1.bed, exclusion_region=None, threads=1, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], keep_intermediates=False, tumor_bam_file=/bbe6b9729a8c4e53b903b5c610702ef8/sim7.snv.bam, normal_bam_file=/c3bd5c15a97e45aab3c4a226cff9b5ad/sim15.normal.bam, tumor_sample=TUMOR, normal_sample=NORMAL, mutect_vcf=None, indelocator_vcf=None, mutect2_vcf=/bd7a3b6e30bd4590aae11c6b80ed265e/1/MuTect2.vcf, varscan_snv=None, varscan_indel=None, jsm_vcf=None, somaticsniper_vcf=None, vardict_vcf=/bd7a3b6e30bd4590aae11c6b80ed265e/1/VarDict.vcf, muse_vcf=None, lofreq_snv=None, lofreq_indel=None, scalpel_vcf=None, strelka_snv=/bd7a3b6e30bd4590aae11c6b80ed265e/1/Strelka/results/variants/somatic.snvs.vcf.gz, strelka_indel=/bd7a3b6e30bd4590aae11c6b80ed265e/1/Strelka/results/variants/somatic.indels.vcf.gz, tnscope_vcf=None, platypus_vcf=None, which=paired Error: Unable to open file /ee7008ee88274839a7a1fc588c1c7d3e/2/MuTect2.vcf. Exiting. Error: Unable to open file /bd7a3b6e30bd4590aae11c6b80ed265e/1/MuTect2.vcf. Exiting. Traceback (most recent call last): File "/usr/local/bin/run_somaticseq.py", line 4, in import('pkg_resources').run_script('SomaticSeq==3.6.3', 'run_somaticseq.py') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1463, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/run_somaticseq.py", line 409, in runPaired( outdir = args.output_directory, \ File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/run_somaticseq.py", line 117, in runPaired outSnv, outIndel, intermediateVcfs, tempFiles = combineCallers.combinePaired(outdir=outdir, ref=ref, tbam=tbam, nbam=nbam, inclusion=inclusion, exclusion=exclusion, mutect=mutect, indelocator=indelocator, mutect2=mutect2, varscan_snv=varscan_snv, varscan_indel=varscan_indel, jsm=jsm, sniper=sniper, vardict=vardict, muse=muse, lofreq_snv=lofreq_snv, lofreq_indel=lofreq_indel, scalpel=scalpel, strelka_snv=strelka_snv, strelka_indel=strelka_indel, tnscope=tnscope, platypus=platypus, keep_intermediates=True) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/combine_callers.py", line 238, in combinePaired mod_mutect2.convert(mutect2_in, snv_mutect_out, indel_mutect_out, False) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/vcfModifier/modify_MuTect2.py", line 64, in convert normal_index = header.index(normal_name) - 9 UnboundLocalError: local variable 'normal_name' referenced before assignment Traceback (most recent call last): File "/usr/local/bin/run_somaticseq.py", line 4, in import('pkg_resources').run_script('SomaticSeq==3.6.3', 'run_somaticseq.py') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1463, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/run_somaticseq.py", line 409, in runPaired( outdir = args.output_directory, \ File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/run_somaticseq.py", line 117, in runPaired outSnv, outIndel, intermediateVcfs, tempFiles = combineCallers.combinePaired(outdir=outdir, ref=ref, tbam=tbam, nbam=nbam, inclusion=inclusion, exclusion=exclusion, mutect=mutect, indelocator=indelocator, mutect2=mutect2, varscan_snv=varscan_snv, varscan_indel=varscan_indel, jsm=jsm, sniper=sniper, vardict=vardict, muse=muse, lofreq_snv=lofreq_snv, lofreq_indel=lofreq_indel, scalpel=scalpel, strelka_snv=strelka_snv, strelka_indel=strelka_indel, tnscope=tnscope, platypus=platypus, keep_intermediates=True) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/combine_callers.py", line 238, in combinePaired mod_mutect2.convert(mutect2_in, snv_mutect_out, indel_mutect_out, False) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/vcfModifier/modify_MuTect2.py", line 64, in convert normal_index = header.index(normal_name) - 9 UnboundLocalError: local variable 'normal_name' referenced before assignment INFO 2021-12-02 17:57:43,927 run_script FINISHED RUNNING paired_example/1/SomaticSeq/logs/somaticSeq.2021.12.02.09.33.06.404.cmd in 5.88 seconds with an exit code of 1. INFO 2021-12-02 17:57:44,013 run_script FINISHED RUNNING paired_example/2/SomaticSeq/logs/somaticSeq.2021.12.02.09.33.06.404.cmd in 5.966 seconds with an exit code of 1. INFO 2021-12-02 17:57:44,015 run_script bash paired_example/logs/mergeResults.2021.12.02.09.33.06.404.cmd Start at 2021/12/02 17:57:44 Traceback (most recent call last): File "/usr/local/bin/concat.py", line 4, in import('pkg_resources').run_script('SomaticSeq==3.6.3', 'concat.py') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 667, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1463, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/concat.py", line 201, in vcf(args.input_files, args.output_file, args.bgzip_output) File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/EGG-INFO/scripts/concat.py", line 28, in vcf with genome.open_textfile(file_i) as vcfin: File "/usr/local/lib/python3.8/dist-packages/SomaticSeq-3.6.3-py3.8.egg/somaticseq/genomicFileHandler/genomic_file_handlers.py", line 171, in open_textfile return open(file_name) FileNotFoundError: [Errno 2] No such file or directory: '/5878b08251464a729f3bbf6e9088071e/paired_example/1/MuTect2.vcf' INFO 2021-12-02 17:57:46,061 run_script FINISHED RUNNING paired_example/logs/mergeResults.2021.12.02.09.33.06.404.cmd in 2.046 seconds with an exit code of 1. INFO 2021-12-02 17:57:46,067 Somatic_Mutation_Workflow SomaticSeq Workflow Done. Check your results. You may remove the 2 sub_directories.

Please help me analyze this problem again,thank you very much.

litaifang commented 2 years ago

Seems like MuTect2 still didn't run successfully. Try executing the mutect2.*.cmd again, and you may let me know what the error/log output from MuTect2.

litaifang commented 2 years ago

Did MuTect2 run? Also make sure you have bedtools on your path. Closing this issue for now. Feel free to re-open it if you're still having issues.