Closed tokebe closed 1 year ago
What appears to be happening is that only one figure_url is kept, instead of each being merged into an array.
and maybe only one figure_title, pmc_reference
is being kept as well...
Basically, I think the "last" Record processed is what is kept (all prior ones are overwritten?)...and instead we want to make the values arrays and add elements to them...
Something like this is already done for other apis ingested through x-bte annotation, like semmeddb (think pubmed IDs) and MyVariant
Maybe semmeddb would have an equivalent test case to see whether the problem happens there too?
Without going step-by-step through the code, I'm not aware off the top of my head what would be causing this.
I think I figured out where the issue is being caused in the code, I can probably make a PR soon if I am correct?
This is the expected behavior, correct? (figure_url and pmc_reference are also arrays). The file I am currently editing for my fix is query_graph_hander/graph/kg_edge.js. Also, would we like to make these fields an array if there is a single value or just keep it as that single value?
the screenshot looks good, ask @tokebe whether you're working with the correct file.
For single values, I think it's fine to leave them as single values (not 1-element arrays). But if you see something different in the code (like elsewhere single values are converted into 1-element arrays), let us know...
Screenshot looks good, make a PR as soon as you're ready.
on a related note @rjawesome @tokebe I think "sets" (aka unique values only) is more useful than "lists". I've noticed that problem with stuff from MyVariant; look at the civic stuff on edges in the following query.
However, maybe this is something separate enough to be a different issue?
POST to MyVariant specifically: http://localhost:3000/v1/smartapi/09c8782d9f4027712e65b95424adba79/query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["DBSNP:rs121913521"],
"categories":["biolink:SequenceVariant"]
},
"n1": {
"categories":["biolink:Disease"]
}
},
"edges": {
"e1": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
Example:
{
"attribute_type_id": "civic_clinical_significance",
"value": [
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response",
"Resistance",
"Resistance",
"Resistance",
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response"
]
},
I can make each attribute into a Set, that should be no problem.
@colleenXu Hmm, your query seems to create another issue, apparently, sometimes an array is being put as the attribute in the record, for example for your query, these two arrays are in the two records that combine to form this edge
[
[
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response",
"Resistance",
"Sensitivity/Response",
"Resistance",
"Sensitivity/Response"
],
[
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response",
"Resistance",
"Resistance",
"Resistance",
"Sensitivity/Response",
"Sensitivity/Response",
"Sensitivity/Response"
]
]
So, right now my code is making a set or array of these two. Would we like to flatten these arrays? (we can only do this if we are sure no values themselves are of the array type). The results you are showing are because the attributes are only taken from the last record
@rjawesome
I think what's happening is that this DBSNP ID actually corresponds to two hits in MyVariant, and the nested nature of the data is what's leading to this...
I think it'd be nice to flatten these arrays and then run a set operation / unique-values-only. I was hoping for something in the end like ["Sensitivity/Response", "Resistance"]
for this particular MyVariant thing.
I currently can't think of cases where we wouldn't want to do this for an edge-attribute....but we'd have to test your stuff carefully once it's ready for testing.
@colleenXu Should be ready for testing, flattening of arrays/usage of set has been implemented.
@rjawesome @tokebe If I try to run the example query from the first post, I get a status 500 response.
Ah I must have made a typo when copying the code over to the other PR... Fix pushed.
Deployed to prod 🚀
When a single edge is derived from multiple Records, the edge attributes appear to be merged improperly, at least in the case of the attribute
figure_url
,figure_title
,pmc_reference
from PFOCR. What appears to be happening is that only onefigure_url
is kept, instead of each being merged into an array.To replicate
In the workspace:
main
branch and updatednpm run git checkout main
&npm run git pull
npm run smartapi_sync
Query:
URL:
http://localhost:3000/v1/smartapi/edeb26858bd27d0322af93e7a9e08761/query
Body:
This query should be returning 9 figures instead of the 1 that appears (
attributes
of the only edge inmessage.knowledge_graph.edges
)What has been confirmed
mappedResponse
object with correct attributes.TODO
mappedResponse
is turned intoedge.attributes
Tagging @colleenXu @ariutta for additional details.