brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

Custom postannotation description #63

Open liqg opened 7 years ago

liqg commented 7 years ago

Hi @brentp could you give some new features that I can use my own Description field in postannotation. This is the config.toml

[[annotation]]
file="hg19_dbscsnv11.vcf.gz"
fields=["dbscSNV_ADA_SCORE","dbscSNV_RF_SCORE"]
ops=["self","self"]

[[postannotation]]
name="paste2"
fields=["dbscSNV_ADA_SCORE","dbscSNV_RF_SCORE"]
op="lua:join("|", dbscSNV_ADA_SCORE, dbscSNV_RF_SCORE)"
type="String"
Description="Format: dbscSNV_ADA_SCORE|dbscSNV_RF_SCORE"

[[postannotation]]
fields=["dbscSNV_ADA_SCORE"]
op="delete"

I got this output

##fileformat=VCFv4.2
##INFO=<ID=dbscSNV_ADA_SCORE,Number=1,Type=String,Description="calculated by self of overlapping values in field dbscSNV_ADA_SCORE from hg19_d
##INFO=<ID=dbscSNV_RF_SCORE,Number=1,Type=String,Description="calculated by self of overlapping values in field dbscSNV_RF_SCORE from hg19_dbs
##INFO=<ID=paste2,Number=.,Type=String,Description="calculated field: paste2">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
1       860326  .       A       C       .       .       dbscSNV_RF_SCORE=0.03;paste2=0.0076|0.03
1       860326  .       A       T       .       .       dbscSNV_RF_SCORE=0.03;paste2=0.0069|0.03

vcfanno version 0.2.4

Could you replace the default "calculated field: paste2" to the description defined in the config file. I think it is more meaningful for the new annotation.

Based on the same reason, I think it is good to customize the content in [[annotation]].

Another point, since I chose to delete the dbscSNV_ADA_SCORE INFO, maybe it's appropriate to delete it in meta lines of the output vcf, not just in the body. Here I just used dbscSNV_ADA_SCORE to generate new annotation, in this case, if there is a variable like 'echo=false' or 'echos=[false]', it is should be more convenient.

The hg19_dbscsnv11.vcf.gz is like this

##fileformat=VCFv4.2
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       860326  .       A       C       .       .       dbscSNV_ADA_SCORE=0.0076;dbscSNV_RF_SCORE=0.03
1       860326  .       A       G       .       .       dbscSNV_ADA_SCORE=0.0076;dbscSNV_RF_SCORE=0.032
1       860326  .       A       T       .       .       dbscSNV_ADA_SCORE=0.0069;dbscSNV_RF_SCORE=0.03
1       860327  .       A       C       .       .       dbscSNV_ADA_SCORE=0.0043;dbscSNV_RF_SCORE=0.04
1       860327  .       A       G       .       .       dbscSNV_ADA_SCORE=0.0043;dbscSNV_RF_SCORE=0.04

Command is

zcat hg19_dbscsnv11.vcf.gz |awk 'BEGIN{OFS="\t"} !/^#/ {$8="."} {print} ' |vcfanno -lua my.lua config.toml /dev/stdin  |less -S

my.lua is

function join(sep, ... )
    return table.concat({...}, tostring(sep))
end

Thanks

brentp commented 7 years ago

Hi, I am hesitant to add this. It seems like a small thing, but it adds to the burden of understanding for the user even if optional, and vcfanno is already complex enough. It's simple to change the description down-stream. You're right that removing the header entry for deleteed items would be good. I'll add that to my list of things todo.