anvilproject / anvil-portal

The NHGRI Analysis Visualization and Informatics Lab-space (AnVIL) website
https://anvilproject.org/
MIT License
2 stars 19 forks source link

Create 1000 genomes copy with instructions on how to run a GWAS analysis #373

Open NoopDog opened 4 years ago

NoopDog commented 4 years ago

From Beth:

I have been thinking about how to cater the current GWAS for Anvil. Here are my thoughts:

Currently, biggest difference between these GWAS tutorials we are creating is how to navigate the data model, not the tools or workflows. I have already listed the GWAS workflows in the AnVIL Dockstore org: https://dockstore.org/organizations/anvil

Alisa Manning’s lab has published a GWAS tutorial (this is what I have used to make the BDCat one) that uses something more like a Terra data model, you can try it with your free credits: https://anvil.terra.bio/#workspaces/amp-t2d-op/2019_ASHG_Reproducible_GWAS-V2

To try to showcase the AnVIL system (and not just Terra), I have been thinking we should ingest the training dataset Alisa’s lab made (we already ingested it in Gen3 BDC data model). This data is based off 1000 Genomes: 1) downsampled VCF data for chromosomes 10 and 11, 2) synthetic phenotypic data that is modeled after real traits (bmi, ldl, hdl, etc)

Do you work in the data model working group? How should we set up this data model to be close to what studies are like in the AnVIL right now?

Once the data is in the AnVIL, I can easily help set up with instructions around how to adjust this data model for use in the current GWAS tutorial.

Here is a workspace I made that uses the training data in BDCat, you can see a lot of instructions and notebooks about manipulating the data model: https://terra.biodatacatalyst.nhlbi.nih.gov/#workspaces/biodata-catalyst/BioData%20Catalyst%20GWAS%201000%20Genomes%20Tutorial

NoopDog commented 4 years ago

Let's see if Beth can resource this one.

Lets start with a link to the current GWAS in BioDataCatalyst

kozbo commented 3 years ago

This is blocked by some confusing data structures in the data tables