UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
51 stars 56 forks source link

Clustering chapter case #77

Closed ttimbers closed 3 years ago

ttimbers commented 3 years ago

We think that it will eventually be good to change the case for the clustering chapter to subspecies identification. Here's a little draft of the intro to the case:

_Here we will present an illustrative example using a simulated data set. Suppose we have collected data on a group of animals that are currently labeled as a single species, but we have some anecdotal evidence to believe that there might be one or more distinct subspecies within this group. Clustering could help provide evidence for or against this. Additionally, it could also suggest which types of individuals should be sampled for a follow-up study to collect biological samples to perform DNA testing to confirm whether these observed subgroups are in fact distinct subspecies.

Below we display simulated data for a group of toothed whales in which we are interested in asking such a question._

The toothed whale idea comes from this news article: https://www.cbc.ca/news/technology/new-whale-species-1.5835576 and there are some articles that look at tooth morphology and see it differs between species: https://anatomypubs.onlinelibrary.wiley.com/doi/full/10.1002/ar.23082

Alternatively, there is a real data set available from this paper on copepods where they do investigate whether sub-species are present: https://link.springer.com/article/10.1007/s10750-010-0351-3 This looks promising, but we would need to ideally find one of species that shows a couple potential subspecies (> 2) that is visible with two vars to separate them.

ttimbers commented 3 years ago

Another possibility is the EOAS data set pitched to use about earthquakes, will dig up from my email and place here.

ttimbers commented 3 years ago

closing - we went with penguins