broadinstitute / gnomad-browser

Explore gnomAD datasets on the web
https://gnomad.broadinstitute.org
MIT License
81 stars 41 forks source link

Review differences between v2 SVs and new v3 SVs #1077

Open phildarnowsky-broad opened 1 year ago

phildarnowsky-broad commented 1 year ago

Have modified the import script so it can mostly import the V3 example. Had some questions for Ryan, pasted below:

  1. v2 SVs had an INFO field called PAR, which appears to only come into play when calculating hemizygous frequencies. This field isn't present in the v3 example, at least under that name.
  2. v2 SVs allowed variants to have an optional second locus associated, via the INFO fields CHR2, POS2, and END2. This was used to support interchromosomal variants. The example VCF is missing the POS2 field.
  3. The v2 data includes histograms of age distribution and genotype quality per variant. That data comes from an entirely different file, and I'd assume that the v2 histogram data is not valid to use with the v3 VCFs.
phildarnowsky-broad commented 1 year ago

As for user-facing code, it looks like we can probably re-use the v2 code as-is, or at least close to it.