Open j-lawson opened 4 years ago
Ok, so the idea here is that, given a BED file, I would give you a plot showing the degree of genetic variation for each of a series of ethnicities, aggregated across the regions that you've provided -- right?
I was more referring to quantifying variation between different ethnicities. That way you might be able to infer whether various ethnicities might have different regulation of that region set.
score each snp for specificity to each ethnicity
analogous to open chromatin tissue specificity plot, could use the same machinery
@jstubbs01 are you still interested in working on this?
Yes,do you have other suggestions how I can start?
Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.
Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).
Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.
Note: people in our department might have easy access to/familiarity with this type of genetic data.
Links: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/