SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Gene based analysis #406

Open davmlaw opened 3 years ago

davmlaw commented 3 years ago

People are often interested in gene X. Do we have any patients with interesting variants?

At the moment we can do it using the all variants node, but then you end up having to go through the ~200 variants by hand - most of the time they're in control samples or samples with totally unrelated phenotypes etc.

So basically when he says that he means can you look at affecteds in the certain cohorts, not parents or marry-ins.

sksmi commented 3 years ago

Cohorts are a really good way of managing this, just that the db is getting big and the old data are messy and disorganised.

davmlaw commented 3 years ago

Sometimes if you re-arrange a query it's way faster.

Eg with cohorts you have to process a dozen massive VCFs then start with 4M variants, then restrict to a gene.

It's faster to START with 200 variants in a gene, THEN check lots of VCFs / patients etc for certain phenotypes

davmlaw commented 3 years ago

Somewhat similar to - #32 - Samples that have a variant in the gene

davmlaw commented 3 years ago

As a quick test, I took an all variants node = GATA2, then filtered to ClinVar LP/P = 59 variants.

Then the 60 VCFs from GA:

vcfs = VCF.objects.filter(project__name='Genomic Autopsy (Australian Perinatal Death Study)') 
In [15]: vcfs.count()                                                                                                                                                                                      
Out[15]: 60

Then filter to variants in those VCFs (regardless of zygosity)

qs2 = qs.filter(cohortgenotype__collection__cohort__vcf__in=vcfs)
In [19]: %time qs2.count()                                                                                                                                                                                 
CPU times: user 5.43 ms, sys: 2.93 ms, total: 8.37 ms
Wall time: 2 s
Out[19]: 1