Open cbizon opened 3 years ago
i can confirm that the number of nodes/edges loaded into neo4j is the same as the number in the kgx files. that is reported on every kgx graph import result.
we mostly rely on the node/edge normalization processes to return valid predicates and labels. that seems to be the best place to insure biolink compatibility
Note: deepak got back to me today informing me that he has installed a "graph summary" report in kgx that we may be able to leverage. i also have another question into Deepak for the biolink validation questions noted above.
Deepak indicates that there is some basic validation that may be leveraged in KGX.
Just double checking:
i can confirm that the number of nodes/edges loaded into neo4j is the same as the number in the kgx files.
You mean that you can write code that confirms this at load time?
we mostly rely on the node/edge normalization processes to return valid predicates and labels. that seems to be the best place to insure biolink compatibility
This will ensure some forms of compatibility, (categories and predicates), but it will not help in any way on checking domains and ranges.
the old KGX prints out the number of node/edges it inserts into the graph based on the data that comes in. that does not necessarily indicate that all of what came in made it to the graph.
i plan on looking at the 2 enhancements deepak mentioned today on biolink validaton and enhanced reporting. my hope is that these may have some actionable output we can use programmatically.
also note that the load manager has some metadata about the data services raw data parse that will give us some better insight to the quality of the parsing from that perspective.
OK, but just to be as clear as possible: we want automated verification of all aspects that are amenable to such. Printing stuff out is not the same.
i understand. i will see about loading in some smaller datasets into kgx tomorrow to see what we can pull out.
One aspect of this is doing biolink validation a la KGX
Another is checking provenance sources #97
Another is checking that each edge has the appropriate validation properties #105
What are the elements of a graph that we can automatically validate?
Does KGX have biolink validation?