biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
177 stars 73 forks source link

Annotate semantic types with Wikidata QIDs #8

Closed cmungall closed 3 years ago

cmungall commented 6 years ago

cc @stuppie

stuppie commented 6 years ago

I'll make a pull request. Docs with notes here: https://docs.google.com/spreadsheets/d/1syH5W4o9uWDApb5LgT5conTAFCiN62yPgzu9FzwbuH0/edit?usp=sharing @cmungall Can you comment on some of the issues in the notes?

cmungall commented 6 years ago

cell https://www.wikidata.org/wiki/Q7868 vs cell type https://www.wikidata.org/wiki/Q189118 in wd

It looks like neuron is an instance of cell-type and a subclass of cell. This makes sense. I don't really see the use case for WD to have both, it's unlikely you would have actual cell instances in WD (maybe an instance for the ur-cell, or the ancestral cell of all euks and archaea?).

I found one case of a dual instance/subclass. I made a note here but I don't really know if I should be notifying a bot: https://www.wikidata.org/wiki/Talk:Q2619679

The class/metaclass distinction is quite useful for organism vs taxon though

cmungall commented 6 years ago

spoke too soon, here are some instances of cell (should be subclasses):

$ pq-wd "Cell=wd:'Q7868',instance_of(X,Cell),enlabel(X,XN)"
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q5010870,CFU-E
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3493700,Splenocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q5712474,Hemocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q28000183,Medlar bodies
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q101026,platelet
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q574674,Anti-HBs
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q963397,synoviocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2619679,Akinete
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2382063,promyelocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3108891,Glioblast
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q632518,T helper cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q1543282,Granulosa cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2594789,Band cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3270846,myelocyte

And subclasses of cell type:

$ pq-wd "CellType=wd:'Q189118',subclass_of(X,CellType),enlabel(X,XN)"
http://www.wikidata.org/entity/Q189118,http://www.wikidata.org/entity/Q47088881,nucleated cell
http://www.wikidata.org/entity/Q189118,http://www.wikidata.org/entity/Q6810199,Meiocyte
cmungall commented 6 years ago

Added some notes here https://www.wikidata.org/wiki/Talk:Q7868

cmungall commented 6 years ago

symptom in wikidata. First there seems to be some confusion about classes vs instances.

# transitive subclass of symtpom
$ pq-wd  "isa_symptom(S),enlabel(S,SN)" S-SN  | wc
     786    2351   53252

# inferred instance of synonym
$ pq-wd  "symptom_inf(S),enlabel(S,SN)" S-SN  | wc
     290     678   17496

all dual instance/subclass

$ pq-wd  "symptom_inf(S),isa_symptom(S),enlabel(S,SN)" S-SN  
http://www.wikidata.org/entity/Q86,headache
http://www.wikidata.org/entity/Q183425,abdominal pain
http://www.wikidata.org/entity/Q35830,sneeze
http://www.wikidata.org/entity/Q3002092,abdominal cramps
http://www.wikidata.org/entity/Q2673323,malaise
http://www.wikidata.org/entity/Q3589142,epigastric pain
http://www.wikidata.org/entity/Q245455,chromosome 5q deletion syndrome
http://www.wikidata.org/entity/Q186889,nausea
http://www.wikidata.org/entity/Q270421,muscle weakness
http://www.wikidata.org/entity/Q537297,heartburn
http://www.wikidata.org/entity/Q1338684,tension headache
http://www.wikidata.org/entity/Q693058,chest pain
http://www.wikidata.org/entity/Q21109236,urinary burning
http://www.wikidata.org/entity/Q8038367,wrist pain
http://www.wikidata.org/entity/Q21077144,abnormal hematologic indices
http://www.wikidata.org/entity/Q21109840,chest tightness
http://www.wikidata.org/entity/Q21120091,discomfort
http://www.wikidata.org/entity/Q21117104,weak feet
http://www.wikidata.org/entity/Q21117872,deep pain
http://www.wikidata.org/entity/Q21402621,greatly increased nasal secretions and oral secretions
http://www.wikidata.org/entity/Q21120264,precordial pain
http://www.wikidata.org/entity/Q21077147,eye ache
http://www.wikidata.org/entity/Q21110281,dry burning throat
http://www.wikidata.org/entity/Q21119839,eye pain
http://www.wikidata.org/entity/Q21120154,pain on inspiration

symptom in WD appears to be general constitutive symptoms?

Then we have terms like https://www.wikidata.org/wiki/Q1100988 micrognathism which is neither isa nor part of

it's under: https://www.wikidata.org/wiki/Q6869195

which is under 'clinical sign' https://www.wikidata.org/wiki/Q1441305

oh no, please not the sign vs symptom distinction...

going up we have https://www.wikidata.org/wiki/Q28807560 clinical finding

which is a https://www.wikidata.org/wiki/Q639907 medical finding

I don't know the distinction between medical and clinical finding, but I think medical-finding is closer to phenotypic feature. And symptom is already a subtype of clinical finding.

So Q639907 may be the best, but this is highly inappropriate for model organism phenotypic features...

cmungall commented 6 years ago

We also have https://www.wikidata.org/wiki/Q1921834 feature

another general question is what to do with traits in biolink-model. Do we treat as phenotypes? Or have a separate class?

wd trait subclasses

$ pq-wd "isa_trait(T),enlabel(T,TN)"
http://www.wikidata.org/entity/Q1211967,trait
http://www.wikidata.org/entity/Q7243545,primitive
http://www.wikidata.org/entity/Q23786,eye color

and instances

$ pq-wd "trait_inf(T),enlabel(T,TN)"
http://www.wikidata.org/entity/Q80157,temperament
http://www.wikidata.org/entity/Q17122705,brown
http://www.wikidata.org/entity/Q16939403,blue-green
http://www.wikidata.org/entity/Q27839441,purple
http://www.wikidata.org/entity/Q27777837,yellow
http://www.wikidata.org/entity/Q30069237,dark eyes
http://www.wikidata.org/entity/Q30069240,light eyes
http://www.wikidata.org/entity/Q42845936,blue-gray
http://www.wikidata.org/entity/Q17126729,red
http://www.wikidata.org/entity/Q17122854,green
http://www.wikidata.org/entity/Q17122740,hazel
http://www.wikidata.org/entity/Q17122834,blue
http://www.wikidata.org/entity/Q17244465,black
http://www.wikidata.org/entity/Q17291407,amber
http://www.wikidata.org/entity/Q19359739,Midphalangeal hair
http://www.wikidata.org/entity/Q17245659,grey
http://www.wikidata.org/entity/Q17244894,dark brown

Midphalangeal hair is a phenotypic feature (or trait value). The others are values.

Anyway this may be sorted out with a general alignment of WD to phenotype ontologies, trait ontologies (e.g. OBA) and PATO (primitive attributes and values)

stuppie commented 6 years ago

As you may have noticed, these types of entities don't have a widespread and systematic structure... (We (genewiki team) haven't done much specifically with symptoms, findings, phenotypes, cell types, etc.)) Being able to make well-structured queries on them will have to wait until they are better organized (i.e. are aligned with some ontology). In the meantime, we can construct custom queries to get some of the more useful parts out...

fractaler commented 6 years ago

Just give one of the Wikidata's statements (at this moment), which speaks about the quality of its model of the world: set cell is a subset of the set cellular component (part of a cell), "cell is part of a cell", =a set is a subset of the subset (Russell's paradox?)

cmungall commented 6 years ago

Thanks @stuppie - hope to have some time to help with cells and findings in WD. The mappings thus far in your spreadsheet seem good, will work on more later

@fractaler - the statement makes sense to me. cell subClassOf cellular component (which is consistent with GO). CC is a confusing term though, it doesn't mean cell part, it is a designation for things at a certain level of granularity

stuppie commented 6 years ago

Right, @fractaler , this aligns with GO. The cellular component is not "a component of a cell", its the root node for the CC ontology. Cell is a type of cellular component.

fractaler commented 6 years ago

I just take what the Wikidata said. And Wikidata said: cell: the basic structural and functional unit of all organisms. Ok, let's take, for example, 1-cell organism: unicellular organism (organism that consists of only one cell). Substitute the value in the variable "organism" and get: "cell is the basic structural and functional unit of organism that consists of only one cell". I would not recommend this model to anyone. Wikidata now do not allow to create an accurate, scientific model, it is now dominated by the terminological chaos. For example, homonyms in it: parent = parents, child = children, chemical = chemicals, sibling = siblings, ancestor = ancestors, first-degree relative = first-degree relatives, etc. "Cell" also is a homonym: 1) unit of multicellular organism structure, 2) unicellular organism.

lubianat commented 4 years ago

Hello, @cmungall @stuppie and @fractaler ,

I found this issue looking at the discussion page on the talk page of the item cell.

I am planning to work on cleaning the problems around cell type definitions on Wikidata for the near future, and it is good to see that these issues affect other Wikidata users. I mean, good to see that solving the issues may have a practical value.

I am specifically focused on the issues about cell types. If you have any suggestions of issues that if solved on Wikidata, would be improve its value for the Cell Ontology / OBO community, that would be great.

Also, if there are people actively working on this in 2020, I would love to join and help.

nlharris commented 4 years ago

What is the status of this?

deepakunni3 commented 4 years ago

Needs more input from @cmungall

sierra-moxon commented 3 years ago

@cmungall - I think I am going to close this for now. Nomi and I traced the PR associated with this case and it got merged (and then WD ids were removed from the model). We can definitely reopen if necessary.