Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
456 stars 153 forks source link

Clinvar SV recommendation #1595

Closed prasundutta87 closed 9 months ago

prasundutta87 commented 9 months ago

Describe the issue

A clear and concise description of what the bug is.

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

System

Full VEP command line

 vep -i multisample_cleaned.vcf.gz --assembly GRCh38 --format vcf --offline --cache --dir_cache /gpfs3/well/gel/pdutta/vep_cache --cache_version 110 --vcf --force_overwrite --max_sv_size 500000000 --symbol --regulatory --canonical --variant_class --biotype --nearest symbol --fork 16 --hgvs --mane --fasta /gpfs3/well/gel/pdutta/hg38_reference/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna --overlaps --compress_output gzip --gene_phenotype --pubmed -o clinvar_vep.vcf.gz --custom file=Clinvar_SVs.vcf.gz,short_name=ClinVar,format=vcf,fields=CLNSIG%CLNREVSTAT%CLNDN,type=within,reciprocal=1,overlap_cutoff=80

Full error message

I don't have any error, but I just wanted to know if this is the correct way to find overlaps between my detected SVs and Clinvar SV. Is there any recommendation? I was looking into this-https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html. Clinvar SVs downloaded from here-https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/vcf/nstd102.GRCh38.variant_call.vcf.gz

Data files (if applicable)

They include:

nuno-agostinho commented 9 months ago

Hey @prasundutta87,

Thanks for your query. Your command returns ClinVar structural variants (SVs) located within the regions spanned by your input variants, for cases where 80% (overlap_cutoff=80) of the input variant is covered by the ClinVar structural variant (reciprocal=1[^1]).

By using type=within, you should only get ClinVar SVs that are completely within the input variant (if this is not what you want, maybe use type=overlap instead).

[^1]: Using reciprocal=0 would return ClinVar SVs only if 80% of the ClinVar structural variant is covered by the input variant.

Another way to find reported SVs from our database that overlap with VEP is by using the --check_svs parameter. However, note that this is only available using --database.

Please tell me if you have any further questions.

Best regards, Nuno

prasundutta87 commented 9 months ago

Thanks for your reply, @nuno-agostinho. I basically have an SV VCF (generated using long reads) and I am trying to find if any of them are pathogenic in ClinVar or not. Generally, SV lengths will differ (due to many factors such as algorithm used, technology used, etc.), so overlap cannot be exact. I am a little worried about a ClinVar Pathogenic SV, which is smaller than my input SV, it might not be annotated.

nuno-agostinho commented 9 months ago

Hey @prasundutta87,

As long as the SVs overlap the input variants by 80%, your smaller ClinVar SV should still be returned.

Anyway, if you want to get all possible overlaps, you can always use: overlap_cutoff=0 and type=overlap. Of course, these may return too many values and make VEP run slower. You can then try to increase the overlap_cutoff to higher values and see how the results satisfy your use case.

Hope this information helps a bit.

Best, Nuno

prasundutta87 commented 9 months ago

Thanks a lot for this @nuno-agostinho !

prasundutta87 commented 9 months ago

Hi @nuno-agostinho..just wanted to update that the annotation worked. This is the command I finally used:

--custom file="$VEP_data_resources"/clinvar_SV/clinvar_SVs.vcf.gz,short_name=ClinVar,format=vcf,fields=CLNSIG%CLNACC%ORIGIN%PHENO%VALIDATED,type=overlap,reciprocal=1,overlap_cutoff=80,same_type=1

Regards, Prasun

nuno-agostinho commented 9 months ago

Hey @prasundutta87, I'm glad to know that it worked!

I am going to close this ticket now, but feel free to open a new one if you have any more issues or feedback.

Have a great day!

Cheers, Nuno