broadinstitute / gnomad-browser

Explore gnomAD datasets on the web
https://gnomad.broadinstitute.org
MIT License
81 stars 41 forks source link

Document schema/fields in browser release and gene model HTs #1369

Closed ch-kr closed 1 month ago

ch-kr commented 9 months ago

Hi all, just following up on today's browser meeting: would it be possible for someone on the team to document the fields present in the browser release HT (not sure what you call this on your team, but the combined version of the exomes and genomes sites HTs our team sends for release) and the gene model HT? Here is an example of how we documented the schema for the v4 exomes release and the v4 HT Help page.

rileyhgrant commented 8 months ago

Here's a link to a google doc containing the marked up schema that documents the shape of the hail tables and the meaning of each of the fields.

https://docs.google.com/document/d/1zP5yErlmoNHOL3HhdUVuBbNCZskjaCysZFrZE7uAqbs/edit?usp=sharing

ch-kr commented 7 months ago

thank you for creating this document! I've added comments and suggestions.

One higher level question for this schema: the site quality metrics histograms displayed on the variant pages display adj metrics, right? (Metrics calculated using only high-quality genotypes. The frequencies we display on the browser are all adj filtered). If yes, then you shouldn't need to the raw qual hists in the browser table, since they don't get loaded

ch-kr commented 7 months ago

thanks to Riley for sharing the code used to create these tables (also sharing here to track for future reference)

I have a couple questions about these two tables:

I also have one comment about the gene model table (cc @mattsolo1): it seems like the GRCh37 version of this table should be stable we shouldn't be updating it), so releasing that one sounds good to me. Given that GRCh38 constraint is experimental, however, I vote we remove all constraint annotations from this table prior to public release. We can add them to the table in the future after we've made more updates and simply overwrite the existing resource. What do you think?