RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

In KG2 Neo4j, node publications and synonyms (and relationship publications) are not Neo4j lists? can we fix that? #998

Closed saramsey closed 4 years ago

saramsey commented 4 years ago

I note that in Neo4j (kg2endpoint.rtx.ai:7474), for whatever reason, the synonyms property of a node is a Neo4j string and not a Neo4j list:

Screen Shot 2020-08-04 at 11 44 24 AM

The same goes for the publications property of a node:

Screen Shot 2020-08-04 at 11 45 38 AM

In contrast, the provided_by relationship property is an actual, honest-to-goodness Neo4j list:

Screen Shot 2020-08-04 at 11 46 18 AM

(but note again, on the example relationship record shown, the publications property is a Neo4j string and not a Neo4j list).

I would like to fix this in kg_json_to_tsv.py. I'm not sure if this would also require some tweaks to tsv-to-neo4j.sh.

But.... I'm mindful that this may cause some code changes in the Node Synonymizer and elsewhere in the code base. So I guess we should start with a conversation about the cost/benefit.

ecwood commented 4 years ago

It would not be difficult to fix in kg_json_to_tsv.py and tsv-to-neo4j.sh is already set up to work with the fix, due to the provided_by edge property.

amykglen commented 4 years ago

I'm in favor of making these actual lists; I don't think it will disturb ARAX code. expand currently converts these fields to actual lists when populating nodes/edges in the API model. (and I believe even the conversion step wouldn't break if these fields were actual lists to begin with.) so if any other modules are accessing publications/synonym on nodes/edges in Message.KnowledgeGraph, they're already using those fields in their list form.

I also do the same thing when using synonym in the script I wrote for the NodeSynonymizer (convert it to an actual list using the same method). and I don't believe the NodeSynonymizer otherwise touches these properties.

so I think it'd be great to change these to actual lists; I'd happily go into the ARAX code and remove my conversion steps.

edeutsch commented 4 years ago

Switching to real lists seems like a good thing from my point of view, but I am minimally affected if at all.

saramsey commented 4 years ago

It would not be difficult to fix in kg_json_to_tsv.py and tsv-to-neo4j.sh is already set up to work with the fix, due to the provided_by edge property.

@ericawood great, can you please make the change? Let's make this change in the kg2-curie-refactoring branch.

saramsey commented 4 years ago

@ericawood has generously agreed to take the lead on this issue

ecwood commented 4 years ago

I tested this on an older kg2-simplified.json and it appears to be working. (However, now if there are no synonyms/publications for a node, those fields do not show up). Below are some examples from Neo4j:

Edge with publications image

Edge without publications image

Node with publications and synonyms image

Node without publications or synonyms image

Node without publications but with synonyms image

Node without synonyms but with publications image

ecwood commented 4 years ago

This is fixed in the latest build by Lindsey (which is currently in Neo4j on kg2erica and I'm 90% sure the TSV files are in the S3 bucket). Can I close this out?

saramsey commented 4 years ago

Looks good, thank you @ericawood.