KarchinLab / open-cravat

A modular annotation tool for genomic variants
MIT License
116 stars 27 forks source link

vcf report export is short of records compared to tsv and excel #296

Open clinicalngs opened 2 months ago

clinicalngs commented 2 months ago

Trying to extract a vcf out of an annotated file (single sample). The number of entries is smaller than the ones in gui (excel, tsv are correct). This is true when reports are obtained from gui as well as from command line. No filtering was applied at this point.

The objective is to extract certain variants from priority genes to pass the vcf to a different annotator, which imposes data size limitations. Procedure was:

  1. Annotate a wgs or wes with many annotators, including clinvar & gnomad3 in windows (oc v. 2.8)
  2. Shrink sqlite with a filter (clinvar-no benign; gnomad <0.01; set of priority genes) in command line using oc util.
  3. Reimport smaller sqlite in oc gui and verify number of variants. Also, open smaller sqlite in DB Browser (a viewer for sqlite) and verify selected variants. All OK so far.
  4. Export vcf, tsv, excel from gui and from command line. Both procedures give the same results: VCF is missing some variants (~20%) compared to those in the smaller sqlite and in tsv and excel exports. No pattern of exclusion can be observed - low and high quality, SNP and InDels, all chromosomes, same genes - all are equaly or randomly selected in or out.

Is there any reason that exporting in vcf format may filter or miss certain variants? Perhaps I did something wrong. I would like to obtain the same number of variants as in the smaller filtered sqlite file. Ideally I would also like to remove all previous annotators to further shrink the data file. Thanks

jasminebro commented 1 month ago

Hi @clinicalngs our IT team could not recreate your issue. Are you able to share your VCF file with us via email (support@opencravat.org) so we can better troubleshoot the issue?

Thank you for using OpenCRAVAT!