ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Gut - Tier 2 #1287

Open arschat opened 3 months ago

arschat commented 3 months ago

We have received the Gut Tier 2 recommendations from the bionetwork, and we are providing feedback on them.

arschat commented 3 months ago

We've replied with some feedback to Kyle, and Azucena Salas provided some extra comments and suggestions about T1 & T2 structure. We will wait for Kyle's answer before we answer to Azucena.

arschat commented 3 months ago

Here is our recommendations template

arschat commented 3 months ago

We had a meeting on Friday with Kyle and Chris, and we discussed some of the points mentioned in original email. Here are the notes with the questions from the original email, and some other non-tier2 questions addressed as well.

notes from meeting > Here are our notes from our meeting on Friday. > > > Previous_surgery - does it need to be distinct from ‘treatment’? We could specify the examples as “gut_resection; stoma; gastric_bypass; coleostomy” to guide contributors > Recording gut surgery is very important because it can alter the physiology of the whole gut. This will have a dedicated field at the donor level > > > Indication_for_procedure - is this information referring to the procedure that resulted in tissue collection or all procedures reported? > > Gut biopsy is invasive so it’s very rarely performed without medical concerns. It’s important to know what prompted the biopsy in the first place to evaluate how well the sample can recapitulate normal tissue. This will be recorded at the sample level to allow distinct explanations for longitudinal samples. The possible values will include [diagnostic, during treatment, after treatment] (let us know if there are more for now) plus other to be confirmed later on. > > known_gut_related_disease; disease_free_text > > Since all the diseases are ontologised it’s possible to separate gut specific diseases from other diseases. We can map diseases from the free text field to an ontologised disease (even a more generalized one) and merge them with the ontologised field. The original free text values will still be recorded in our .text field. > > > Radial hierarchy - I’m wondering if this information could be collected with organ_part since the ontology is quite granular. For example there are separate terms for muscularis mucosae of stomach, or submucous nerve plexus of colon, or muscularis mucosa of colon or intestinal villus of ileum > > Using the ontology terms doesn’t account for samples that contain more than one layer, so using a separate field would be best. > > Disease_location - is the purpose here to ask if the collected sample is affected by a specific disease? Or is it important to know the affected location even if no samples were collected from that location? > > Many disease are not localised to a specific organ part so this information needs to be recorded explicitly at the donor level. > > Not discussed in the meeting: > > > Gut_specific_medication - we have a generic field to report medications used, do we need a gut specific one? Again we can just amend the template with a suitable description > Would you like this field ontologised, a specific pre-defined list of medications or free text? > > > alcohol_history > > Another bionetwork has suggested the use of 'alcohol_units', 'alcohol_usage_duration' and 'alcohol_type'. Would you be interested to use one of their fields? We would suggest 'alcohol_units' per week which is derived from "strength (ABV) x volume (ml) ÷ 1,000". Does that fisible to fill and include enough information for you? > > Anatomical location - We have two levels of detail to describe the sampled tissue, organ and organ_part, both of them ontologised. For example we would record small intestine as organ and ileocolic junction as organ_part. Would this be descriptive enough for you? In this case we will use our existing fields. > > In Tier 1 there is the "tissue_ontology_term_id" that is described as "The detailed anatomical location of the sample, please provide a specific UBERON term". In our schema we have the Organ that is mapped to Tier 1 "tissue" field that could be Level 1 of the anatomical_location, but we can also record the ontologised Organ_part in another more specific anatomical location field, that allows the Level 2-3-4 options. > > > Dietary_state - can you please provide a definition for this field? We’re not sure of what it's meant to capture > > Could you provide a definition for this? If this field describes the diet of the donor during the collection, should we also provide "omnivore" or "vegatarian" options as well? Another bionetwork is interested in recording the diet of the donor but with simpler options, so "omnivore" and "vegatarian" options would allow both bionetworks to use the same field. > > > About the number of projects/ source datasets that we already have fastq files: Out of the 34 source datasets, there are 16 publications or pre-prints. In our portal we have wrangled 7 of them (link, 3 of them had available raw files) and we are currently working on adding 6 more (3 of them have available raw files). All other projects might have deposited their data in a managed access repository (like EGA or dbGAP) or they might have their raw data not available. > > Finally, about the plug-in that David Osumi is working on, we are looking if we can ingest spreadsheets directly from google sheets, and we will let you know. > > Best regards, > Ida, Arsenios and Gabby
arschat commented 3 days ago

Gut has variable donor metadata (age, weight, bmi) at the sample level. We cannot move donor metadata the specimen/ sample level. It would require changes in downstream components as well.

We discussed that we can use the following steps: