Closed saramsey closed 4 years ago
My response to this request, reproduced here for convenience:
Hi @isbluis OK I can see this being very useful downstream. I'm just hesitating about where to code it up. Isn't the KnowledgeGraph object created by expand? It is just passed to resultify. If we were to have resultify start modifying the KnowledgeGraph, isn't that mixing two somewhat orthogonal operations? That would add a lot of code in resultify to be cycling through KG nodes and modifying them, when presumably expand is already visiting every KG node in the course of generating the KG? @amykglen and @edeutsch what do you think?
Yes - I think this is something for me - all nodes/edges are loaded into the API model in expand
. (So that's where the NodeAttributes/EdgeAttributes are created.)
And yep, I can easily put urls into EdgeAttribute.url!
Great, thanks!
Agreed, expand() is the right place to handle this. (or possibly during the building of KG2 depending on how this information is currently encoded in KG2)
So if an EdgeAttribute
consists of a list of URLs, should that go in the url
property as well?
Ok - I've pushed changes for this to master
- so now urls are put in EdgeAttribute.url
/NodeAttribute.url
rather than EdgeAttribute.value
/NodeAttribute.value
.
(If it's a list of urls, it's still going in value
currently, but that could be changed. I think it's pretty infrequent that a list of URLs is stored in an EdgeAttribute
/NodeAttribute
..)
Hi @amykglen . Do you have an example of an edge instance where its value (url, value,
or otherwise) is a list instead of a scalar/string? (or ideally, a simple query that returns one?)
I currently do not have code to handle any of the elements within edge_attributes
as anything other than single values. Or maybe @edeutsch is handling these via the API before passing the json object to the front-end? I would need to add code to handle lists if that is a requirement.
Thanks!
At the moment the spec says that edge_attribute elements should all be scalars. Technically the value should be a string. but we have reason to lobby to allow that to be float or int. As far as a list, I am hesitant. Is there a good reason to need to have a list there? That is not in the spec and not even a discussion item for addition.
Ah, I didn't realize that about EdgeAttributes/NodeAttributes requiring scalars.
I've been dumping extra KG2 node/edge properties that do not map to existing API object model properties into EdgeAttributes/NodeAttributes (per #630), but perhaps that will need to change if lists aren't allowed.
I think it's actually only NodeAttributes that lists are currently being put into... (specifically the node publications
and synonym
fields from KG2 neo4j). If you run this query:
add_qnode(id=n00, curie=CHEMBL.COMPOUND:CHEMBL1771)
expand(kp=ARAX/KG2, synonym_handling=add_all)
resultify()
Here are some examples of NodeAttributes with lists in the resulting knowledge graph:
For node CHEMBL.COMPOUND:CHEMBL1771
:
{
"name":"publications",
"type":"None",
"url":"None",
"value":[
"PMID:26807542",
"PMID:24164581",
"PMID:24900388",
"PMID:25075638",
"PMID:24393581",
"PMID:23899349",
"PMID:22197392",
"PMID:16248836",
"PMID:24859764",
"PMID:24513044",
"PMID:24387347",
"PMID:19932971",
"PMID:20014752",
"PMID:22055718",
"PMID:22409598",
"PMID:26948801",
"PMID:27996269",
"PMID:24931384",
"PMID:27908762",
"PMID:15646539",
"PMID:22428882",
"PMID:28653847",
"PMID:22194678",
"PMID:21467212",
"PMID:28219047",
"PMID:22232427"
]
}
{
"name":"synonym",
"type":"None",
"url":"None",
"value":[
"clopidogrel",
"PUBCHEM_BIOASSAY:49666423",
"SID49666423",
"GKTWGGQPFAXNFI-HNNXBMFYSA-N",
"None",
"InChI=1S/C16H16ClNO2S/c1-20-16(19)15(12-4-2-3-5-13(12)17)18-8-6-14-11(10-18)7-9-21-14/h2-5,7,9,15H,6,8,10H2,1H3/t15-/m0/s1",
"ASTRAZENECA:1793",
"COC(=O)[C@@H](N1CCc2sccc2C1)c3ccccc3Cl",
"BNF:16755",
"CLOPIDOGREL",
"PUBCHEM_BIOASSAY:49666065",
"ATC:16755",
"DRUGMATRIX:228",
"SID49666065",
"Clopidogrel"
]
}
Also - I think my question about what to do with a list of URLs is actually irrelevant - occasionally the node publications
list contains URLs, but not always. So to my knowledge we actually don't have a need to store a list of (only) URLs in a NodeAttribute/EdgeAttribute.
Well, the spec is here: https://github.com/NCATS-Tangerine/NCATS-ReasonerStdAPI/blob/6a8a86b3b5cba67af9c750f64e738a52bb8a86e8/API/TranslatorReasonersAPI.yaml#L787
But we can lobby for change. I am a fan of providing whatever information you can. I would be willing to lobby for changing this to allow more types.
But Luis may have other ideas.
But I would say that it would be best to change this: "type":"None", to this: "type":"list", in your above examples. The 'type' field is there specifically to tell the reader what type is in the value.
oops, actually 'type' should probably be an ontology concept of "list of publications" for example. The 'type' should probably be the semantic type instead of the data type. So my above comment isn't really right. Although 'list' might still be better than 'None'.
Ah, got it re: setting the type...
Rather than storing these extra KG2 properties in NodeAttributes/EdgeAttributes, is it worth considering adding these as further properties in our API object model? Not to the translator-wide standard, but just to our output? I thought I recall Patrick Wang saying during the hackathon that that's perfectly legal for the API standard (you're allowed to have extra properties).
It is true that it is perfectly legal. But I question whether it is wise. Since we expected that we would want to provide additional information on nodes and edges, we created an explicit mechanism to do so, the node_attribute and edge_attribute. It would seem less desirable to circumvent this mechanism and just add our own custom attributes to node and edge. Also, by using node_attribute and edge_attribute, via the 'type' mechanism, these attributes could in theory be machine interpretable. If the 'type' defined the true semantic type, then smart software could potentially do something useful with it automatically. If it's just an additional attribute, then it can be displayed and humans might recognize it, but it would be difficult for software to automatically recognize it. But that's all just my opinion and not necessarily widely shared.
So after, further reflection, I think this would be best:
{ "name":"publications", "type":"data:1187", "url":"https://pubmed.ncbi.nlm.nih.gov/$value", "value":[ "PMID:26807542", "PMID:24164581", "PMID:24900388", "PMID:25075638", "PMID:24393581", "PMID:23899349", "PMID:22197392", "PMID:16248836", "PMID:24859764", "PMID:24513044", "PMID:24387347", "PMID:19932971", "PMID:20014752", "PMID:22055718", "PMID:22409598", "PMID:26948801", "PMID:27996269", "PMID:24931384", "PMID:27908762", "PMID:15646539", "PMID:22428882", "PMID:28653847", "PMID:22194678", "PMID:21467212", "PMID:28219047", "PMID:22232427" ] }
We call it "publications" but the true semantic type is provided. I suppose the fact that it's a list can be obtained through introspection and does not need to be explicit.
Clearly I've changed my mind 3 times in 3 posts, so maybe it's all wrong. Or maybe one of them is right!
I am also a fan of providing all relevant info; I just need to know what to expect so I can handle it.
I already handle lists in provided_by
and publications
; glad to add more!
And at least from my end, checking the data type directly from the json is equivalent to looking at the value of the type
attribute; so I don't really need it.
On the other hand, I would need to know what the extra properties are so I can deal with them.
or yet one more thought: instead of the silly $value thing, if value is a list, then url should be a list with all the urls of the individual items in the list:
"url":[ "https://pubmed.ncbi.nlm.nih.gov/26807542", "https://pubmed.ncbi.nlm.nih.gov/24164581" ], "value":[ "PMID:26807542", "PMID:24164581" ]
Sounds good! I'm fine with continuing to use NodeAttributes/EdgeAttributes.
I like where you're going with the publications attribute, @edeutsch, although one problem I foresee is that it's not only PMIDs that are put into this list - here are some other examples:
{
"type":"None",
"name":"publications",
"value":[
"",
"PMID:9862332",
"PMID:11513144",
"PMID:15057824",
"DOI:10.1111/j.1399-0039.2007.00930.x",
"DOI:10.1111/j.1399-0039.2006.00567.x",
"PMID:11960306",
"DOI:10.1002/(SICI)1521-4141(199812)28:12<3959::AID-IMMU3959>3.0.CO",
"PMID:17900289",
"PMID:16634863",
"DOI:10.1038/nature02399",
"DOI:10.1034/j.1600-065X.2001.1810119.x",
"DOI:10.1038/sj.gene.6363836",
"2-2"
],
"url":"None"
}
{
"type":"None",
"name":"publications",
"value":[
"ISBN:0198506732"
],
"url":"None"
}
{
"type":"None",
"name":"publications",
"value":[
"https://rarediseases.info.nih.gov/diseases/5878/babesiosis"
],
"url":"None"
}
(I'm not sure what all the different possible things that might appear in this property in KG2 look like... these are just some of the first I came across)
hmm, that make it more complicated. Perhaps the better thing to do is instead of making the node_attribute value a list with N items, instead there should be N node_attributes, each one with the appropriate type. EDAM has a type for DOI which is data:1188 http://bioportal.bioontology.org/ontologies/EDAM/?p=classes&conceptid=http%3A%2F%2Fedamontology.org%2Fdata_1188&jump_to_nav=true etc.
ISBN is data:2634 http://bioportal.bioontology.org/ontologies/EDAM/?p=classes&conceptid=http%3A%2F%2Fedamontology.org%2Fdata_2634&jump_to_nav=true
If it's not a PMID or a DOI or a ISBN, maybe just a generic article: http://bioportal.bioontology.org/ontologies/EDAM/?p=classes&conceptid=http%3A%2F%2Fedamontology.org%2Fdata_0971&jump_to_nav=true
It's not clear if such care will be rewarded, but that's probably the right way to do it.
Ok cool - I just separated this out into a new issue: #756. Will pick up there!
And also created #757 to deal with the missing types.
Ok, so now that I separated out that stuff, I think this issue is complete - all (scalar) Node/EdgeAttributes that are urls are now put in the url
field rather than the value
field. Just needs to be rolled out/confirmed working.
Tested on production - appears to be functioning as desired. This is what these fields are now looking like in 'results' in the UI, fyi:
I suggest actually displaying the URL itself (hyperlinked) because a human can look at the URLs and understand what they signify.
Request posted by @isbluis in #743: