Open arschat opened 6 months ago
We received the following email, which we should review and provide feedback.
As you may know, the HCA Equity Working Group has a task force called the HCA Diversity Task Force, which has been tasked to develop the following recommendations:
- appropriate metrics and resolutions for tracking genetic and geographical diversity in HCA;
- HCA goals for genetic and geographical diversity, as well as a timeline for achieving the goals;
- ethically- and scientifically-appropriate processes for engaging underrepresented populations in sampling and monitoring efforts; and
- the use of diversity-related metadata across all HCA data platforms, including appropriate metadata fields, definitions, and/or ontologies.
We want to share the metadata recommendations from the task force with you and welcome any feedback you may have. It would be helpful to receive input by Friday, March 22.
converted to excel
sex_genetic
& ancestry_genetic
require processing of fastq files. Are there legal issues from our site on that? We may run it automatically in ingest, or manually.age
do we allow decimals?ethnicity_*
and geography_*_country_state
fields, can be ontologised?geography_*_duration
specify years in the guidance.smoking_tobacco_cigarette
HLCA v2 proposed also smoking_pack_years. Would that be of interest here?alcohol_consumption
do we care for more options instead of yes/ no?ethnicity_parents_selfreported_freetext
& ethnicity_grandparents_selfreported_freetext
, how we fill if parents & grandparents have different ethnicity? should we allow multiple answers separated by ';' or '|'?medical_history_*
would we like to record specifically the diagnosis, i.e. MONDO disease ontology?Hi, a while back I did some analysis for a DCP roadmap discussion. The slides are here: https://docs.google.com/presentation/d/1_qrZ1Rnax5FgymtOu1wL9frdW40GMG7WLTNUNgC0bbs/edit#slide=id.g21f702e4abe_21_0
I can also reference a paper I was co-author on: https://doi.org/10.1186/s13059-018-1396-2
Happy to discuss this further if it's useful.
There is an ontology for countries in NCIT:C25464.
Action points for modelling & feedback:
comments sent
We received an updated version of their original recommendations, that includes some of our suggestions. document flatten spreadsheet
After a meeting between me, Ida & Gabby, we concluded on the following feedback:
We also decided that we are not going to change the requirement in HCA schema for those fields (like age), but just add the option(s) for unknown/not available/not collected. We expect to wrangle projects that do not follow the Tier 1/ Tier 2/ Genetic Diversity schema.
We replied with the following email along with the hlca template. Lucia replied that she are in favour of the two tab template, and initially suggested to specify recommended fields but in another meeting we agreed at least for lung to omit the mandate of fields and leave everything as optionally filled.
Waiting for their reply.
For now we remove pipeline fields (sex_genetic, ancestry_genetic, ancestry_genetic_pipeline) from our templates.
There are some actions and points need to be addressed:
ethnicity_1
, ethnicity_2
and ethnicity_free_text
originally. They would like to have enum of specific options to allow calculation of predicted values for pulmonary function parametersethnicity
instead of ethnicity_1
and enthnicity_2
, Malte agreedethnicity_selfreported_freetext
and ethnicity_question_text
ethnicity_free_text
with GDT ethnicity_selfreported_freetext
and agreedethnicity
mention Ancenstry (i.e. African Ancestry
) although ethnicity is not ancestry.
original metadata document: document metadata spreadsheet: spreadsheet correspondence document: document