databio / GenomicDistributions

Calculate and plot distributions of genomic ranges
http://code.databio.org/GenomicDistributions
Other
25 stars 10 forks source link

Plot for ethnic/geographic genetic variation of genomic region #13

Open j-lawson opened 4 years ago

j-lawson commented 4 years ago

Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.

Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).

Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.

Note: people in our department might have easy access to/familiarity with this type of genetic data.

Links: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/

nsheff commented 4 years ago

Ok, so the idea here is that, given a BED file, I would give you a plot showing the degree of genetic variation for each of a series of ethnicities, aggregated across the regions that you've provided -- right?

j-lawson commented 4 years ago

I was more referring to quantifying variation between different ethnicities. That way you might be able to infer whether various ethnicities might have different regulation of that region set.

nsheff commented 4 years ago

score each snp for specificity to each ethnicity

analogous to open chromatin tissue specificity plot, could use the same machinery

nsheff commented 2 years ago

@jstubbs01 are you still interested in working on this?

jstubbs01 commented 2 years ago

Yes,do you have other suggestions how I can start?