Add PrimAD reference data

broadinstitute / seqr-loading-pipelines

hail-based pipelines for annotating variant callsets and exporting them to elasticsearch

MIT License

22 stars 20 forks source link

Add PrimAD reference data #320

Open hanars opened 2 years ago

hanars commented 2 years ago

From the original ticket in the seqr repo https://github.com/broadinstitute/seqr/issues/2547 :

primAD has sequence data from ~1000 primates from many different species and is helpful to look at variation in this database, similar to looking at gnomAD. We would want to know the primate AF for a variant, as this is some evidence that the variant is tolerated. In these 1000 primates, there is as much variation seen as there is in the 60,000 people in ExAC.

Add AC/AFs from primAD as a reference population dataset, with link to the primAD variant page - BUT only for the primate data. primAD also has a copy of the gnomAD data which we already have in seqr, so we would want the data just for primates.

mike-w-wilson commented 2 years ago

Mike to check in with Matt. If there isnt a download, meet with Kyle next month to touch base about dataset.

mattsolo1 commented 2 years ago

I'm not aware of a bulk data download, but you can fetch variant data using the API, e.g.

https://primad.basespace.illumina.com/primateApi/variant?bvid=1-55046543-C-T

mike-w-wilson commented 2 years ago

@hanars, is this something that can happen in seqr? Or will we still need a bulk data download to get the desired annotations?

hanars commented 2 years ago

So the issue is we only want to show a link if the variant actually exists in primad, if we just added that style link to every variant in seqr many of them would go nowhere. So we do need a bulk download in order to know if the variant is in primad or not (and then AF/AC would be nice to have but not essential)

mattsolo1 commented 2 years ago

For gnomAD I was planning to query the primAD API on the fly to see if it exists in primAD. If it does, display the link, and if not, tell the user it doesn't exist. But yeah having the bulk download would be nice to avoid adding all that logic.

hanars commented 2 years ago

Yeah I would rather not do that in seqr, I'm worried on a page with 100 variants theres going to be a lag and we'll end up with links popping up on a variant after an analyst already scrolled past it. I think we should hold off until we can get some sort of download, or maybe do those requests as part of the loading?

mike-w-wilson commented 2 years ago

Need to contact primAD to see if they have a download available

lynnpais commented 5 months ago

Talked to Anne and decided to hold off on this as its not being used much.