bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
192 stars 53 forks source link

UnboundLocalError: cannot access local variable 'normal_name' where it is not associated with a value #127

Closed SAADAT-Abu closed 8 months ago

SAADAT-Abu commented 1 year ago

Can someone please help me with this error:

Traceback (most recent call last):
  File "/usr/lib64/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/home/saadat/.local/bin/somaticseq_parallel.py", line 44, in runPaired_by_region
    run_somaticseq.runPaired(outdir_i, ref, tbam, nbam, tumor_name, normal_name,
  File "/home/saadat/.local/lib/python3.11/site-packages/somaticseq/run_somaticseq.py", line 139, in runPaired
    outSnv, outIndel, intermediateVcfs, tempFiles = combineCallers.combinePaired(outdir=outdir, ref=ref, tbam=tbam, nbam=nbam, inclusion=inclusion, exclusion=exclusion,
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saadat/.local/lib/python3.11/site-packages/somaticseq/combine_callers.py", line 276, in combinePaired
    mod_mutect2.convert(mutect2_in, snv_mutect_out, indel_mutect_out, False)
  File "/home/saadat/.local/lib/python3.11/site-packages/somaticseq/vcfModifier/modify_MuTect2.py", line 64, in convert
    normal_index = header.index(normal_name) - 9
                                ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'normal_name' where it is not associated with a value
litaifang commented 1 year ago

This looks like the MuTect2's VCF isn't as expected. Commonly due to either of the 2 reasons: 1) Did you use the Mutect2 in GATK4, or 2) Did the tumor and normal bam files have the same SM (i.e., same sample name)? MuTect2 requires that the tumor and normal have different SM in @RG to distinguish tumor and normal reads.

SAADAT-Abu commented 1 year ago

Yes. I ran Mutect2 in GATK. Mutect2 Version="4.2.3.0". Format of my VCF is different from the one in the provided example.

My VCF:

##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --panel-of-normals /mnt/beegfs/asaadat/Sarek/iGenome/1000g_pon.hg38.vcf.gz --germline-resource /mnt/beegfs/asaadat/Sarek/iGenome/af-only-gnomad.hg38.vcf.gz --output /mnt/beegfs/asaadat/Mutect_calls/output/Job6//WT18.vcf --input:tumor /mnt/beegfs/asaadat/recalibrated/WT18/WT18.recal.cram:tumor --input:normal /mnt/beegfs/as

It does not have --tumor-name and --normal-name. Is that the problem?

litaifang commented 1 year ago

Did you have paired normal bam file, or just a single tumor bam file? If you have paired normal bam file, what are the tumor and normal headers that start with @RG?

SAADAT-Abu commented 1 year ago

I am using paired normal/ tumor bam. Normal header: @RG ID:HGHLKDMXY.HK10.L001 PU:L001 SM:P10_HK10 LB:HK10 DS:/mnt/beegfs/asaadat/Sarek/iGenome/Homo_sapiens_assembly38.fasta PL:ILLUMINA

Tumor header: @RG ID:HGHLKDMXY.WT10.L001 PU:L001 SM:P10_WT10 LB:WT10 DS:/mnt/beegfs/asaadat/Sarek/iGenome/Homo_sapiens_assembly38.fasta PL:ILLUMINA

Thanks a lot.

litaifang commented 1 year ago

I'll need to check if GATK 4.2.x.x has changed output VCF header......

SAADAT-Abu commented 1 year ago

Ok. Here is a VCF from my analysis. https://drive.google.com/drive/folders/1zztdjGAFW9y_CWLzTgxK82wUNjYq3tar?usp=sharing

Thanks. A.Saadat

litaifang commented 1 year ago

You have two "tumor_sample" in your VCF header:

##tumor_sample=P6_HK6
##tumor_sample=P6_WT6

What was your Mutect2 command exactly?

SAADAT-Abu commented 1 year ago

No. HK6 is the healthy, WT6 is the tumor.

The code is here:

# Run MuTect2 using GATK
    gatk Mutect2 \
        -R "$reference_genome" \
        -I:tumor "$tumor" \
        -I:normal "$normal" \
        -O "$output_vcf" \
        --germline-resource "$germline_resource" \
        --panel-of-normals "$panel_of_normals"

It was called with crams Here is the CSV from which samples were read. WT6 /mnt/beegfs/asaadat/recalibrated/WT6/WT6.recal.cram /mnt/beegfs/asaadat/recalibrated/HK6/HK6.recal.cram

I tried to run SomaticSeq with CRAMs too. but got the same error

litaifang commented 1 year ago

Mutect2 is regarding both samples as tumor samples. The VCF header has both of the samples as ##tumor_sample.

Try instead (not sure if --panel-of-normals is compatible with the following though).

    gatk Mutect2 \
        -R "$reference_genome" \
        -I HK6.recal.cram \
        -I WT6.recal.cram \
        -tumor P6_WT6 \ # Whatever after SM: in tumor bam
        -normal P6_HK6 \ # Whatever after SM: in normal bam
        -O "$output_vcf" \
        --germline-resource "$germline_resource" \
        --panel-of-normals "$panel_of_normals"
SAADAT-Abu commented 1 year ago

Followed your suggestion. Now my VCFs are looking ok. Thanks a lot.