CCICB / vcf2mafR

Convert VCFs and VCF-related formats to MAF
Other
1 stars 1 forks source link

Add extra columns #3

Open selkamand opened 1 year ago

selkamand commented 1 year ago

From: https://github.com/mskcc/vcf2maf/blob/main/vcf2maf.pl

The original vcf2maf perl implementation appears to add a bunch of nonstandard but very useful columns when converting VCFs to MAF files

# Add extra annotation columns to the MAF in a consistent order
my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
    Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
    SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
    EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
    PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
    HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomAD_AF gnomAD_AFR_AF
    gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF );

We should ensure we grab as much useful information from VEP annotations as possible.

We also may need to sort columns in output so they appear in a consistent order, and will have to decide whether to include all columns in all cases, even when empty, to preserve outfile structure (rn I'm against this)

selkamand commented 1 year ago

Actually we absolutely need a strict MAF format (same columns each time in the same order) so that the end-user can concatenate resulting MAFs together