EBISPOT / hancestro

https://ebispot.github.io/hancestro/
Creative Commons Attribution 4.0 International
6 stars 2 forks source link

NTRs for Esan, Luhya, Maasai, Mende ethnic groups #14

Closed bonitalam closed 3 years ago

bonitalam commented 3 years ago

Hello, we would like to request new terms for 4 of the ethnic groups currently not in HANCESTRO. These are ethnic groups that were sampled in the 1000 Genomes Project.

They would all be a subclass of Sub-Saharan African (http://purl.obolibrary.org/obo/HANCESTRO_0011).

daniwelter commented 3 years ago

@bonitalam We'd be happy to add these ancestral groups. Could you please provide definitions?

bonitalam commented 3 years ago

Great, thank you @daniwelter! Here are some definitions

Esan: The Esan people are one of the major ethnic groups in Edo State, Nigeria who speak the Esan language. (PMID:10146569)

Luhya: The Luhya are the second-largest ethnic group in Kenya and are comprised of subgroups that speak a common Bantu language. (PMID: 27813082)

Maasai: The Maasai are an indigenous African ethnic group of semi-nomadic people located in northern Tanzania and Kenya. (PMID:29868928)

Mende: The Mende people are one of the largest ethnic groups in Sierra Leone who speak a language of the Mande branch of the Niger-Congo family. (PMID:15761855)

daniwelter commented 3 years ago

@bonitalam Based on the definitions you provide, these sound like cultural ethnic groups rather than genetic populations. Would you be able to include a reference to the genetic population groups as well, please?

bonitalam commented 3 years ago

Sure, let me know if these would suffice:

daniwelter commented 3 years ago

@bonitalam The terms you requested have been created and are included in today's release of HANCESTRO (v2.4). They should propagate to all the usual channels in the coming days. Apologies for the delay in getting this completed.

Could you please let me know which project you intend to use these terms for? I'd like to update the documentation with usage examples.

bonitalam commented 3 years ago

No problem, thank you for creating these terms @daniwelter!

We are planning on switching to HANCESTRO to annotate the donors we have for the ENCODE project.

I also have another question, and I wasn't sure it was worth opening another issue for so I'll just ask it here. We were wondering how HANCESTRO would advise on annotating multi-ethnic donors? Currently, we just annotate using one term/identifier, and I think it makes sense to change to an array where multiple terms could be used. I assume it would be be difficult to manage making new terms for every potential ethnic combination as needed.

daniwelter commented 3 years ago

@bonitalam That's a really good question. Multi-faceted annotations are always tricky. If your primary goal is to capture the reported information as closely as possible, I would go down the array/multi-term annotation route. It is however worth remembering that individuals from a complex multi-ethnic background technically no longer fit the definition of each of their individual ancestral groups but rather represent a novel admixture of the component groups. Some admixed ancestral groups, such as Hispanic-Latin American are well-described but I don't think it would be possible or practical to explicitly describe every possible combination of admixture. As an alternative solution, we created the term admixed ancestry (HANCESTRO:0306) for exactly this purpose. You could include this in lists of terms for multi-ethnic individuals to really highlight that reason behind the multi-term annotation. Would this suit your use case?

bonitalam commented 3 years ago

Thank you for the input @daniwelter and pointing to that "admixed ancestry" term! I definitely think your suggested approach of using a multi-term annotation route and designating "admixed ancestry" would work for our use case and makes the most sense. I greatly appreciate all your help.