getzlab / MutSig2CV

MutSig2CV from Lawrence et al. 2014
Other
30 stars 8 forks source link

newbase column detection #6

Closed dodoflyy closed 2 years ago

dodoflyy commented 2 years ago

Hello, my input maf file contains no "newbase" column. But mutsig2cv confirmed the existence of "newbase" column. Is this some kind of bug?
Mutsig2cv log:

Loading mutations...

Mutation file contains multiple columns for newbase info:
Tumor_Seq_Allele2
newbase          
Will use newbase
Scanning for duplicate patients...

No "newbase" column in my maf file.

$ grep "newbase" MutSig2CV.maf

column "Tumor_Seq_Allele2" is in my maf file

$ grep "Tumor_Seq_Allele2" MutSig2CV.maf
Chromosome      Source_MAF      Hugo_Symbol     Entrez_Gene_Id  Center  NCBI_Build      Start_Position        End_Position    Strand  Variant_Classification  Variant_Type    Reference_Allele      Tumor_Seq_Allele1       Tumor_Seq_Allele2       dbSNP_RS        dbSNP_Val_Status     Tumor_Sample_Barcode     Matched_Norm_Sample_Barcode     Match_Norm_Seq_Allele1  Match_Norm_Seq_Allele2        Tumor_Validation_Allele1        Tumor_Validation_Allele2        Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2   Verification_Status     Validation_Status    Mutation_Status  Sequencing_Phase        Sequence_Source Validation_Method       Score   BAM_File      Sequencer       Tumor_Sample_UUID       Matched_Norm_Sample_UUID        HGVSc   HGVSpHGVSp_Short      Transcript_ID   Exon_Number     t_depth t_ref_count     t_alt_count     n_depth       n_ref_count     n_alt_count     all_effects     Allele  Feature Feature_type    Consequence
   cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation    ALLELE_NUM      DISTANCE        STRAND_VEP      SYMBOL  SYMBOL_SOURCE   HGNC_ID     
  BIOTYPE CANONICAL       CCDS    ENSP    SWISSPROT       TREMBL  UNIPARC RefSeq  SIFT PolyPhen EXON    INTRON  DOMAINS AF      AFR_AF  AMR_AF  ASN_AF  EAS_AF  EUR_AF  SAS_AF  AA_AFEA_AF   
 CLIN_SIG        SOMATIC PUBMED  MOTIF_NAME      MOTIF_POS       HIGH_INF_POS    MOTIF_SCORE_CHANGE    IMPACT  PICK    VARIANT_CLASS   TSL     HGVS_OFFSET     PHENO   MINIMISED    GENE_PHENO       FILTER  flanking_bps    vcf_id  vcf_qual        gnomAD_AF       gnomAD_AFR_AFgnomAD_AMR_AF    gnomAD_ASJ_AF   gnomAD_EAS_AF   gnomAD_FIN_AF   gnomAD_NFE_AF   gnomAD_OTH_AFgnomAD_SAS_AF    vcf_pos OG_Hugo_Symbol
julianhess commented 2 years ago

This is a minor bug. MutSig implicitly converts Tumor_Seq_allele{1,2} into newbase before the multiple column check is performed:

https://github.com/getzlab/MutSig2CV/blob/0109e27e70478181695f31ca8dd281bb44f0b3af/src/MutSig_2CV_v3_11_core.m#L110-L121

You don't have to worry about this; MutSig has already inferred the correct column to use for the variant allele.