edg1983 / GREEN-VARAN

Annotate non-coding regulatory vars using our GREEN-DB, prediction scores, conservation and pop AF
MIT License
17 stars 6 forks source link

VEP annotations #17

Closed Madelinehazel closed 1 month ago

Madelinehazel commented 3 months ago

Hello,

Is GREEN-VARAN compatible with VCFs annotated by VEP?

Thanks, Madeline

edg1983 commented 3 months ago

Hi,

At the moment, automatic updates of gene consequences are available only for snpEff (ANN field) or bcftools csq (BCSQ field).

If neither ANN nor BCSQ is found, GREEN-VARAN will create a new ANN field. However, you can avoid this setting --noupdate option. You will still have GREEN-DB region ids, region types and controlled genes annotated as specific INFO fields (greendb_id, greendb_stdtype, greendb_genes). I understand this is not perfect since the exact region_id-region_type-gene link is not preserved, but I hope it can still be helpful for you.

I'll work to extend the annotation capability for the VEP field.

edg1983 commented 1 month ago

I'm working on this now.

Do you think a minimal representation including only Allele|Consequence|IMPACT|SYMBOL|Gene definition can work?

Unfortunately, the VEP annotation field is less structured than the one generated by snpEff and bcftools, and many custom fields may be added. Thus, making the consequence added by GREEN-VARAN resemble the full CSQ schema of a given VCF requires more work.

Anyway, the first 5 fields should always be there in this order, so I think the proposed solution can be enough here. What do you think?

edg1983 commented 1 month ago

I've made a new release (v1.3.3) that is more flexible in managing the annotations and can now update the VEP CSQ field.

Have a look: https://github.com/edg1983/GREEN-VARAN/releases/tag/v1.3.3

yougulianren commented 2 weeks ago

Dear edg1983 As README mentioned, 3 set of information should in my input VCF before use GREEN. Just now, I can only add gnomAD informations to input VCF use annotate software. Which tools can I use to add the remaining 2 set information? image In the DB directory of GREEN-VARAN, I have saw there were GRCh38_FATHMM-MKL_NC.tsv.gz,GRCh38_ncER_perc.bed.gz,GRCh38_ReMM.tsv.gz,GRCh38_TFBS.merged.bed.gz,GRCh38_UCNE.bed.gz,GRCh38_DNase.merged.bed.gz files. Which tools can I use to annotate my vcf files based on these files? Or how can I use these files to add the remaining 2 set information?