Closed a1618617 closed 3 years ago
Not currently, have been thinking about whether this is required feature - currently my recommendation is to use an analysis bca..
https://variantgrid.com/analysis/2248/
@davmlaw thoughts?
How well will VG cope with having >100 exomes @ 100X coverage in a cohort analysis, It seems memory and storage intensive for querying just one gene under say a rare variant frequency.
Not sure you'd want to do this via a cohort analysis for 100 exomes.. not really in a position to comment, but guessing Dave stores variants for the all variants node (e.g. as used in the analysis linked above) differently bca it's very fast.
@davmlaw Also wondering whether instead of having a list of variants on the gene page it would be better to have a link to create a gene-specific analysis..? e.g. https://variantgrid.com/analysis/2249/ (not this exact analysis, just an example). Could also make the gene page quicker to load as well.?
@sksmi
I need a baby between your analysis above and a cohort of unresolved/partially resolved cases (~100). What is the best way to achieve this.
Is there a way to search for variants in a gene across all samples in a project eg. Genomic Autopsy?
Um.. not sure I can help with the baby bit, but can prob solve the cohort q. ;)
Assuming you mean that you'd like to analyse this gene in a cohort? Need a bit more info before I can help...
Ideally what I would want to do is to take all samples from VCFs assigned to "Genomic Autopsy" > create a cohort from this. Take the cross from your above analysis (Variants in Database) use the cohort node to look for variants in my cohort of interest.
For this gene given there's only about 5 variants of interest, I'd stick with the all variants node approach as you'll see pretty quickly what's GA. For a more general solution, you guys probably want to make a GA cohort in VG (I'd actually make 3 - mother, father & affecteds, so you can combine/subtract as needed) to use for these sorts of exploratory analyses..
It'll take a bit to make & run if you create it now given the # of samples, but it's possible to generate it and use it in the future - pretty quick once the cohort has been created. You might also want to update the GA data management SOP so that every time a new vcf is uploaded it's also added to the existing cohort as you'll all want to share the same cohort(s) rather than make your own.
Cohort approach is of course only useful if you want to make statements about GA specifically, otherwise if it's just a variant screen I'd still stick with the all variants nodes as who knows what might come up. Caveat is always that the samples haven't been joint called..
Sarah's right about the best way to do it being the all variants node + gene symbol.
But as it was easy to add download_grid_json_as_csv=True
to that grid, you can now download it (well, next upgrade)
@davmlaw can you confirm which variants are/aren't shown on the gene page? My understanding is that the table is filtered by zygosity calls, e.g. hom_ref somatics won't be visible? Will add details to docs.
You can see if we have any of:
Download csv works.
Added text & new page to docs: https://github.com/SACGF/variantgrid_docs/blob/master/genes/gene_page.md https://github.com/SACGF/variantgrid_docs/blob/master/genes/gene_symbol.md
@davmlaw can you do a quick review of the pages above and check all ok. Also, couldn't format page for some reason.
Gene symbol was lacking a ".md" extension, and added it to index so it shows.
I filled out the genes page with more information about how gene annotations work.
Hi,
Can we please reopen this for discussion, especially filtering based on cohort.
Hamish has been asking us to look for specific genes in just the GA cohort. Is there a better approach than just using the variant database > filtering by impact/population > manually clicking each variant page to see who the variant belongs to
Thanks Thuong and @PeerArts
Create an analysis, create a cohort node for each VCF that contains GA samples, then put a gene filter beneath them?
LOL @davmlaw are you kidding me? There are way too many different .vcfs to do that.
I think the 'all variants' node would work ok-ish if the genomics collaboration data wouldn't be in there. Most of all variants we see are in that cohort, possibly because of the freebayes caller.
@PeerArts haha, we can't organise your data for you - that's your job. There's already a GA cohort - didn't take that long and you can add as you go along.
Organising data is not my job, but it would be nice to have an easier way to create cohorts from different .vcfs in VG. @davmlaw, are you happy for us to create a massive GA cohort with >160 trios/quads for this? Maybe VG has improved, but earlier I wasn't able to keep adding samples from our >20 different .vcfs to the same cohort (currently increasing number of .vcfs every other week), because it always broke any analysis I tried to do. I just thought it would make life so much easier if we could 'just' upload .vcfs as GA-project .vcfs and only do a project-based cohort analysis.
The biggest issue with cohorts at the moment is the mega VCF that has hundreds of samples and 20M variants - doing anything with that (including creating a new cohort from it) breaks things as I don't have enough free space on the virtual machine to make temporary queries.
Can we move discussion to #322 Multi VCF Analysis - I think I can make a source node that does this for you - maybe using "project" to select VCFs assigned to GA
Hi guys,
Is there a way to export variants in the gene symbol page. Example attached. We often get asked "have you seen this Gene X in your cohort".
Thanks