Closed cmungall closed 3 years ago
I'll make a pull request. Docs with notes here: https://docs.google.com/spreadsheets/d/1syH5W4o9uWDApb5LgT5conTAFCiN62yPgzu9FzwbuH0/edit?usp=sharing @cmungall Can you comment on some of the issues in the notes?
cell https://www.wikidata.org/wiki/Q7868 vs cell type https://www.wikidata.org/wiki/Q189118 in wd
It looks like neuron is an instance of cell-type and a subclass of cell. This makes sense. I don't really see the use case for WD to have both, it's unlikely you would have actual cell instances in WD (maybe an instance for the ur-cell, or the ancestral cell of all euks and archaea?).
I found one case of a dual instance/subclass. I made a note here but I don't really know if I should be notifying a bot: https://www.wikidata.org/wiki/Talk:Q2619679
The class/metaclass distinction is quite useful for organism vs taxon though
spoke too soon, here are some instances of cell (should be subclasses):
$ pq-wd "Cell=wd:'Q7868',instance_of(X,Cell),enlabel(X,XN)"
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q5010870,CFU-E
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3493700,Splenocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q5712474,Hemocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q28000183,Medlar bodies
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q101026,platelet
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q574674,Anti-HBs
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q963397,synoviocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2619679,Akinete
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2382063,promyelocyte
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3108891,Glioblast
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q632518,T helper cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q1543282,Granulosa cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q2594789,Band cell
http://www.wikidata.org/entity/Q7868,http://www.wikidata.org/entity/Q3270846,myelocyte
And subclasses of cell type:
$ pq-wd "CellType=wd:'Q189118',subclass_of(X,CellType),enlabel(X,XN)"
http://www.wikidata.org/entity/Q189118,http://www.wikidata.org/entity/Q47088881,nucleated cell
http://www.wikidata.org/entity/Q189118,http://www.wikidata.org/entity/Q6810199,Meiocyte
Added some notes here https://www.wikidata.org/wiki/Talk:Q7868
symptom in wikidata. First there seems to be some confusion about classes vs instances.
# transitive subclass of symtpom
$ pq-wd "isa_symptom(S),enlabel(S,SN)" S-SN | wc
786 2351 53252
# inferred instance of synonym
$ pq-wd "symptom_inf(S),enlabel(S,SN)" S-SN | wc
290 678 17496
all dual instance/subclass
$ pq-wd "symptom_inf(S),isa_symptom(S),enlabel(S,SN)" S-SN
http://www.wikidata.org/entity/Q86,headache
http://www.wikidata.org/entity/Q183425,abdominal pain
http://www.wikidata.org/entity/Q35830,sneeze
http://www.wikidata.org/entity/Q3002092,abdominal cramps
http://www.wikidata.org/entity/Q2673323,malaise
http://www.wikidata.org/entity/Q3589142,epigastric pain
http://www.wikidata.org/entity/Q245455,chromosome 5q deletion syndrome
http://www.wikidata.org/entity/Q186889,nausea
http://www.wikidata.org/entity/Q270421,muscle weakness
http://www.wikidata.org/entity/Q537297,heartburn
http://www.wikidata.org/entity/Q1338684,tension headache
http://www.wikidata.org/entity/Q693058,chest pain
http://www.wikidata.org/entity/Q21109236,urinary burning
http://www.wikidata.org/entity/Q8038367,wrist pain
http://www.wikidata.org/entity/Q21077144,abnormal hematologic indices
http://www.wikidata.org/entity/Q21109840,chest tightness
http://www.wikidata.org/entity/Q21120091,discomfort
http://www.wikidata.org/entity/Q21117104,weak feet
http://www.wikidata.org/entity/Q21117872,deep pain
http://www.wikidata.org/entity/Q21402621,greatly increased nasal secretions and oral secretions
http://www.wikidata.org/entity/Q21120264,precordial pain
http://www.wikidata.org/entity/Q21077147,eye ache
http://www.wikidata.org/entity/Q21110281,dry burning throat
http://www.wikidata.org/entity/Q21119839,eye pain
http://www.wikidata.org/entity/Q21120154,pain on inspiration
symptom in WD appears to be general constitutive symptoms?
Then we have terms like https://www.wikidata.org/wiki/Q1100988 micrognathism which is neither isa nor part of
it's under: https://www.wikidata.org/wiki/Q6869195
which is under 'clinical sign' https://www.wikidata.org/wiki/Q1441305
oh no, please not the sign vs symptom distinction...
going up we have https://www.wikidata.org/wiki/Q28807560 clinical finding
which is a https://www.wikidata.org/wiki/Q639907 medical finding
I don't know the distinction between medical and clinical finding, but I think medical-finding is closer to phenotypic feature. And symptom is already a subtype of clinical finding.
So Q639907 may be the best, but this is highly inappropriate for model organism phenotypic features...
We also have https://www.wikidata.org/wiki/Q1921834 feature
another general question is what to do with traits in biolink-model. Do we treat as phenotypes? Or have a separate class?
wd trait subclasses
$ pq-wd "isa_trait(T),enlabel(T,TN)"
http://www.wikidata.org/entity/Q1211967,trait
http://www.wikidata.org/entity/Q7243545,primitive
http://www.wikidata.org/entity/Q23786,eye color
and instances
$ pq-wd "trait_inf(T),enlabel(T,TN)"
http://www.wikidata.org/entity/Q80157,temperament
http://www.wikidata.org/entity/Q17122705,brown
http://www.wikidata.org/entity/Q16939403,blue-green
http://www.wikidata.org/entity/Q27839441,purple
http://www.wikidata.org/entity/Q27777837,yellow
http://www.wikidata.org/entity/Q30069237,dark eyes
http://www.wikidata.org/entity/Q30069240,light eyes
http://www.wikidata.org/entity/Q42845936,blue-gray
http://www.wikidata.org/entity/Q17126729,red
http://www.wikidata.org/entity/Q17122854,green
http://www.wikidata.org/entity/Q17122740,hazel
http://www.wikidata.org/entity/Q17122834,blue
http://www.wikidata.org/entity/Q17244465,black
http://www.wikidata.org/entity/Q17291407,amber
http://www.wikidata.org/entity/Q19359739,Midphalangeal hair
http://www.wikidata.org/entity/Q17245659,grey
http://www.wikidata.org/entity/Q17244894,dark brown
Midphalangeal hair is a phenotypic feature (or trait value). The others are values.
Anyway this may be sorted out with a general alignment of WD to phenotype ontologies, trait ontologies (e.g. OBA) and PATO (primitive attributes and values)
As you may have noticed, these types of entities don't have a widespread and systematic structure... (We (genewiki team) haven't done much specifically with symptoms, findings, phenotypes, cell types, etc.)) Being able to make well-structured queries on them will have to wait until they are better organized (i.e. are aligned with some ontology). In the meantime, we can construct custom queries to get some of the more useful parts out...
Just give one of the Wikidata's statements (at this moment), which speaks about the quality of its model of the world: set cell is a subset of the set cellular component (part of a cell), "cell is part of a cell", =a set is a subset of the subset (Russell's paradox?)
Thanks @stuppie - hope to have some time to help with cells and findings in WD. The mappings thus far in your spreadsheet seem good, will work on more later
@fractaler - the statement makes sense to me. cell subClassOf cellular component (which is consistent with GO). CC is a confusing term though, it doesn't mean cell part, it is a designation for things at a certain level of granularity
Right, @fractaler , this aligns with GO. The cellular component is not "a component of a cell", its the root node for the CC ontology. Cell is a type of cellular component.
I just take what the Wikidata said. And Wikidata said: cell: the basic structural and functional unit of all organisms. Ok, let's take, for example, 1-cell organism: unicellular organism (organism that consists of only one cell). Substitute the value in the variable "organism" and get: "cell is the basic structural and functional unit of organism that consists of only one cell". I would not recommend this model to anyone. Wikidata now do not allow to create an accurate, scientific model, it is now dominated by the terminological chaos. For example, homonyms in it: parent = parents, child = children, chemical = chemicals, sibling = siblings, ancestor = ancestors, first-degree relative = first-degree relatives, etc. "Cell" also is a homonym: 1) unit of multicellular organism structure, 2) unicellular organism.
Hello, @cmungall @stuppie and @fractaler ,
I found this issue looking at the discussion page on the talk page of the item cell.
I am planning to work on cleaning the problems around cell type definitions on Wikidata for the near future, and it is good to see that these issues affect other Wikidata users. I mean, good to see that solving the issues may have a practical value.
I am specifically focused on the issues about cell types. If you have any suggestions of issues that if solved on Wikidata, would be improve its value for the Cell Ontology / OBO community, that would be great.
Also, if there are people actively working on this in 2020, I would love to join and help.
What is the status of this?
Needs more input from @cmungall
@cmungall - I think I am going to close this for now. Nomi and I traced the PR associated with this case and it got merged (and then WD ids were removed from the model). We can definitely reopen if necessary.
cc @stuppie