Closed saramsey closed 4 years ago
It would not be difficult to fix in kg_json_to_tsv.py
and tsv-to-neo4j.sh
is already set up to work with the fix, due to the provided_by
edge property.
I'm in favor of making these actual lists; I don't think it will disturb ARAX code. expand
currently converts these fields to actual lists when populating nodes/edges in the API model. (and I believe even the conversion step wouldn't break if these fields were actual lists to begin with.) so if any other modules are accessing publications
/synonym
on nodes/edges in Message.KnowledgeGraph
, they're already using those fields in their list form.
I also do the same thing when using synonym
in the script I wrote for the NodeSynonymizer
(convert it to an actual list using the same method). and I don't believe the NodeSynonymizer
otherwise touches these properties.
so I think it'd be great to change these to actual lists; I'd happily go into the ARAX code and remove my conversion steps.
Switching to real lists seems like a good thing from my point of view, but I am minimally affected if at all.
It would not be difficult to fix in
kg_json_to_tsv.py
andtsv-to-neo4j.sh
is already set up to work with the fix, due to theprovided_by
edge property.
@ericawood great, can you please make the change? Let's make this change in the kg2-curie-refactoring
branch.
@ericawood has generously agreed to take the lead on this issue
I tested this on an older kg2-simplified.json and it appears to be working. (However, now if there are no synonyms/publications for a node, those fields do not show up). Below are some examples from Neo4j:
Edge with publications
Edge without publications
Node with publications and synonyms
Node without publications or synonyms
Node without publications but with synonyms
Node without synonyms but with publications
This is fixed in the latest build by Lindsey (which is currently in Neo4j on kg2erica and I'm 90% sure the TSV files are in the S3 bucket). Can I close this out?
Looks good, thank you @ericawood.
I note that in Neo4j (
kg2endpoint.rtx.ai:7474
), for whatever reason, thesynonyms
property of a node is a Neo4j string and not a Neo4j list:The same goes for the
publications
property of a node:In contrast, the
provided_by
relationship property is an actual, honest-to-goodness Neo4j list:(but note again, on the example relationship record shown, the
publications
property is a Neo4j string and not a Neo4j list).I would like to fix this in
kg_json_to_tsv.py
. I'm not sure if this would also require some tweaks totsv-to-neo4j.sh
.But.... I'm mindful that this may cause some code changes in the Node Synonymizer and elsewhere in the code base. So I guess we should start with a conversation about the cost/benefit.