Open XComp opened 6 years ago
Do you have an example GraphML that demonstrates the issue? Are you loading the file with graph.io(graphml()).readGraph("mygraph.xml")
or some other way?
The GraphML files from TinkerPop found under the data
directory all load fine.
I created a simple example with the tinkerpop-modern.xml found in the janusgraph repository using the following code:
import org.apache.tinkerpop.gremlin.structure.io.IoCore;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import java.io.IOException;
public class TinkerpopGraphMLImportTest {
public static void main(String[] args) throws IOException {
JanusGraphFactory.Builder configBuilder = JanusGraphFactory.build();
configBuilder
.set("gremlin.graph", "org.janusgraph.core.JanusGraphFactory")
.set("storage.backend", "inmemory")
.set("graph.set-vertex-id", "true");
try (JanusGraph graph = configBuilder.open()) {
System.out.println("[Pre-import] Node count: " + graph.traversal().V().count().next());
graph.io(IoCore.graphml()).readGraph("core/src/test/resources/tinkerpop-modern.xml");
System.out.println("[Post-import] Node count: " + graph.traversal().V().count().next());
}
}
}
The problem seems to be the graph.set-vertex-id
. If disabled, the import succeeds and the expected number of vertices (6) is printed. If enabled, I get the same IllegalArgumentException
being thrown having the following stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: Must provide vertex id
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:515)
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:123)
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:129)
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:47)
at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.findOrCreate(GraphMLReader.java:310)
at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.readGraph(GraphMLReader.java:177)
at org.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLIo.readGraph(GraphMLIo.java:92)
at TinkerpopGraphMLImportTest.main(TinkerpopGraphMLImportTest.java:16)
This underlines the observed behavior of this issue. Probably, the tests on GraphML loads is not tested on custom vertex IDs?
I also grepped for the XML files in the JanusGraph repository. Non of the XML files seem to be loaded anywhere:
$ git clone git@github.com:JanusGraph/janusgraph.git
$ cd janusgraph
$ grep -R 'tinkerpop-[a-z]*.xml' .
Binary file ./.git/index matches
Is this an indication that the GraphML is not tested at all? Or am I misled here?
Thanks for the full test case. I wouldn't be surprised if there aren't any test cases in JanusGraph specifically around GraphML loading, with or without the graph.set-vertex-id
property.
The GraphML functionality is coming from TinkerPop. The code in GraphMLReader assumes that the vertex id is a String
.
You could consider using GraphSON if you want to do vertex id assignment.
I wanted to load GraphML file with given vertex IDs (the set IDs are in the valid JanusGraph format). A node element in this GraphML file looks like this.
I always get a
IllegalArgumentException
sayingMust provide vertex id
. I investigated the case and figured out that there's some wrong (?) behavior happening whenorg.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.findOrCreate(..)
is called while reading the graph and processing a END_ELEMENT fornode
sub-elements (seeorg.apache.tinkerpop.gremlin.structure.io.graphml.GraphMLReader.readGraph(..)
:The
currentVertexId
is a String value in the code above which was extracted from the start element's attribute list (respectively theid
attribute) of the currently processed sub-element.findOrCreate
expects the vertex ID to be an Object, then. WithinfindOrCreate
the passed vertex features are then used to check whether the passed vertex ID is valid which is never the case since the StandardJanusGraph implementation expects the vertex ID to be along
value.Is this a conceptual bug in the StandardJanusGraph implementation? Or is this kind of expected behavior?