HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
64 stars 32 forks source link

Assemble list of allowed Organ ontologies #1163

Open mshadbolt opened 4 years ago

mshadbolt commented 4 years ago

Description

As a metadata team member I want to define a list of organs that are used in `specimen_from_organism.organ.ontology|ontology_label` to ensure consistency in how the terms are applied across projects in the DCP. This ticket stemmed from #1133 **Acceptance Criteria**
lauraclarke commented 4 years ago

@mshadbolt For organ parts, what is your goal?

ESapenaVentura commented 4 years ago

Also, just a clarification: Ontologies that we already have, or that we could potentially have? (e.g. there is no dataset with tissue from the organ part inner lobe of left ear but we could potentially have it in the future because that value is allowed on the ontology we use)

lauraclarke commented 4 years ago

I think having a limited set of organs is a great idea and ensuring we have good examples for organ part but organ part will always be much more detailed so we won't be able to be exhaustive as new bits get defined and different groups discuss what the best terms are

mshadbolt commented 4 years ago

As specified in the ticket I linked to #1133 we decided that we need a tighter set of allowable organs to ensure consistency. These can be specified within our ontology similar to a slim. This is a ticket to define that list and ensure we come up with a list of organs that is likely to cover all data that is currently submitted and will be submitted in the future.

@lauraclarke Some data that is submitted doesn't have what one would classically think of as an organ, e.g. blood & bone marrow, in these cases I want to be clear about what organ should be used as there are multiple ways currently submitted, e.g. hematopoietic system & immune system

lauraclarke commented 4 years ago

This feels like a great list to get feedback on from the metadata-community email list and slack channel. Is this something which might be possible in the new year?

mshadbolt commented 4 years ago

See below for a proposed list of organs.

Open questions/comments/concerns:

Most organs are straight forward but the best way to model some tissue types can be approached from many different angles.

Sometimes I struggled with reconciling the the semantics of organ with the type of tissue a sample is taken from. I pondered whether we should change the name of this field to something like 'source tissue'? e.g. blood, bone marrow.

Bone marrow is one example where we could put it as part of the 'bone' category but I think generally researchers don't generally group marrow with bone since it performs such a different function. Also would it be worth identifying the specific bone the bone marrow was taken from, or is that not something the community would want to know.

Skin is another tricky one in that it is such a large organ that data consumers may want to know both the specific layer of skin a sample was taken from and also the location on the body it was from. One way around this could be to use chained ontology terms but we would need rules in place for how these are used to ensure they are easily interpretable by both humans and computers.

Embryonic/Developmental samples are another tissue type that I wasn't totally sure about but propose having the categories 'embryonic tissue' and 'extraembryonic tissue' to cover all the developing organs as well as things like yolk sac and non-embryonic endoderm.

If we want to strictly enforce these organs how would we enforce validation when a particular organ_part could be a valid child term of multiple organs in the list.

Are @HumanCellAtlas/wranglers aware of any other organs/organ parts that wouldn't fit within these categories?

Does anyone have any other opinions on how we should be grouping organ/organ parts?

ontology_link organ organ_part_example
UBERON:0001013 adipose tissue pericardial fat
UBERON:0000178 blood umbilical cord blood; venous blood
UBERON:0002371 bone marrow red bone marrow
UBERON:0002481 bone tissue proximal epiphysis of tibia
UBERON:0000955 brain cerebellum; prefrontal cortex
UBERON:0002450 decidua decidua parietalis
UBERON:0001103 diaphragm skeletal muscle tissue of diaphragm
UBERON:0005291 embryonic tissue presumptive gut; blastocyst;endoderm of foregut-midgut junction
UBERON:0001043 esophagus abdominal part of esophagus
UBERON:0005292 extraembryonic tissue yolk sac; visceral endoderm
UBERON:0000970 eye retinal neural layer; corneal epithelium
UBERON:0003889 fallopian tube ampulla of uterine tube
UBERON:0000948 heart apex of heart
UBERON:0002113 kidney cortex of kidney; renal medulla
UBERON:0000059 large intestine colon; rectum smooth muscle tissue
UBERON:0002107 liver caudate lobe of liver
UBERON:0002048 lung lower lobe of left lung
UBERON:0000029 lymph node mediastinal lymph node
UBERON:0001911 mammary gland lobule of mammary gland
UBERON:0000165 mouth sublingual gland
UBERON:0000004 nose midnasal cavity
UBERON:0000992 ovary corpus luteum
UBERON:0001264 pancreas islet of Langerhans; pancreas head parenchyma
UBERON:0001987 placenta decidua basalis (?)
UBERON:0002367 prostate gland posterior lobe of prostate
UBERON:0014892 skeletal muscle organ biceps brachii
UBERON:0002097 skin of body lower back skin||reticular layer of dermis
UBERON:0002108 small intestine ileum; jejunum
UBERON:0002240 spinal cord lumbar spinal cord white matter
UBERON:0002106 spleen hilum of spleen
UBERON:0000945 stomach serosa of fundus of stomach
UBERON:0000473 testis Leydig cell region of testis
UBERON:0002370 thymus cortex of thymus
UBERON:0001723 tongue horny papilla of tongue
UBERON:0003126 trachea trachealis
UBERON:0000056 ureter ureter smooth muscle
UBERON:0000057 urethra urethra urothelium
UBERON:0001255 urinary bladder urinary bladder detrusor smooth muscle
UBERON:0000995 uterus

also pinging @zoependlington so she is aware of the conversation

tburdett commented 4 years ago

This looks great @mshadbolt, nice work. On the question of validation - there's a really good discussion to have here about how much of this should be structured into the ontology (UBERON or our own application ontology) and whether there's axiomatisation that can help us, and that we can use in validation. Pinging @simonjupp as well as Zoe for thoughts on that

zperova commented 4 years ago

thank you @mshadbolt As we discussed I think anything related to development will be a special case as some "organs" do not exist and some exist only at a certain age and become something else afterwards, and we can also consider the use of "anatomical system".

lauraclarke commented 4 years ago

This feels like a great discussion topic with the metadata community. I think we need to decide what our desired outcome is though before we pose the question otherwise we might just end up with a lot of discussions without any concrete outcome.

I see the consistency goal and I appreciate it looks better for there to be consistency in particular in the organ selection as it helps give much better facets from a search perspective but is there value from any other angles than the discovery one? What about the organ part? Are we striving for consistency or are we wanting to ensure that the anatomy definitions are sufficiently good and of the right granularity that people aren't put in a position of making sub-optimal annotations because there isn't anything better?

lauraclarke commented 4 years ago
Screenshot 2020-01-09 at 15 13 19

This seems like a good list to compare our allowable organ rather than organ part terms. Is there a good middle ground between this and each discrete organ for that top-level list?

mshadbolt commented 4 years ago

I am not quite sure I understand, are you suggesting we use systems rather than organs? How would we reconcile when a single organ or tissue type can be a part of multiple systems? Or are you suggesting we also have a system level metadata field?

lauraclarke commented 4 years ago

I thought one goal here was to introduce consistency for limiting the number of terms that were acceptable in the organ field so we don't get say colon and large intestine.

I figured this set of 14 systems was a good starting point to establishing the list, but recognise that some of them are so high level it might be better to allow the discrete organs where there are ones, so rather than only allowing endocrine system to be specified, having pancreas, thymus, adrenal glands and whatever other endocrine organs existed.

It would be good to see if these system names could be derived from the ontology though as that would link up nicely, so all organs could be linked to one or more of these systems

mshadbolt commented 4 years ago

I feel quite strongly that if this field is labelled as 'organ' that it shouldn't contain the name of a system. But would be good to get feedback from others and the community about what they would expect from this field.

Yes there is a way of getting an associated system from an organ term in the ontology but it is possible for one term to be a part of multiple systems, e.g. blood is a part of immune system and haematopoietic system

pnejad commented 4 years ago

I agree @mshadbolt. I believe there was a discussion with the UX team in this repo or the wrangling repo (It could have also been at one of our F2F meetings) on why we should not use system. I'll look around in my notes.

pnejad commented 4 years ago

I like @zperova's idea of using "anatomical system" for special cases where 'organ' does not exist.

jahilton commented 4 years ago

Would it make sense to split into separate 'organs' & 'systems' fields?

lauraclarke commented 4 years ago

I would suggest if we have a system field that we make sure it can be automatically filled out rather than expecting submitters to provide an additional piece of information.

The list I posted above was to provide a helpful starting point to consider what the reduced list of organs should be rather than necessarily to be the terms which went in the organ field.

Endeavouring to be consistent with the organ system categorisation that the executive uses seems like a good idea

zperova commented 4 years ago

I think systems field will be very helpful for display of the data in the Browser, but should be automatically assigned based on the organ (so the organs will be grouped based on some algorithm by the Browser in the absence of ontology expansion). We could add a systems field in the metadata and fill in all the relevant systems however that is not a sustainable solution.