JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.32k stars 1.17k forks source link

Caused by: java.lang.IllegalStateException: Multiple properties exist for the provided key, use Vertex.properties(lang) #1587

Open zhentaowang opened 5 years ago

zhentaowang commented 5 years ago

When I bulk load json data to janusgraph with BulkLoadVertexProgram(IncrementalBulkLoader),The problem has been encountered. I want to know why this exception is thrown when the property value is more than two

tools version janusgraph-0.3.1 hadoop-gremlin-3.3.0 hbase-1.1.3 es-6.2.4

janusgraph schema

JanusGraphManagement m = janusGraph.openManagement()
VertexLabel person = m.makeVertexLabel("person").make()
VertexLabel software = m.makeVertexLabel("software").make()
PropertyKey blid  = m.makePropertyKey("bulkLoader.vertex.id").dataType(Long.class).make()
PropertyKey age = m.makePropertyKey("age").dataType(Integer.class).cardinality(Cardinality.LIST).make()
PropertyKey name = m.makePropertyKey("name").dataType(String.class).cardinality(Cardinality.SET).make()
PropertyKey lang = m.makePropertyKey("lang").dataType(String.class).cardinality(Cardinality.LIST).make()
m.addProperties(person, blid, name, age)
m.addProperties(software, blid, name, lang)
EdgeLabel created = m.makeEdgeLabel("created").make()
EdgeLabel knows = m.makeEdgeLabel("knows").make()
PropertyKey weight = m.makePropertyKey("weight").dataType(Float.class).make()
m.addProperties(created, weight)
m.addProperties(knows, weight)
//index
JanusGraphIndex index = m.buildIndex("mixedIndex", Vertex.class).addKey(name, Mapping.TEXT.asParameter()).buildMixedIndex("search")//不支持唯一性检查,search为index.search.backend中的search
//使用IncrementBulkLoader导入时,去掉下面注释
JanusGraphIndex bidIndex = m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).indexOnly(person).buildMixedIndex("search")
m.commit()

my code

JanusgraphSchema.defineGratefulDeadSchema(janusGraph);
Graph graph = GraphFactory.open("config/hadoop-graphson.properties");
BulkLoaderVertexProgram blvp = BulkLoaderVertexProgram.build().bulkLoader(IncrementalBulkLoader.class).writeGraph("config/janusgraph-hbase-es.properties").create(graph);
graph.compute(SparkGraphComputer.class).program(blvp).submit().get();
graph.close();
GraphTraversalSource g = janusGraph.traversal();
List<Map<String, Object>> vertexList = g.V().valueMap().toList();
System.out.println("data1: " + vertexList);
janusGraph.close();
farodin91 commented 5 years ago

@zhentaowang Could you add both configuration files (janusgraph-hbase-es.properties and hadoop-graphson.properties) and the full stacktrace of this exception?

zhentaowang commented 5 years ago

@farodin91 This is my janusgraph-hbase-es configuration information:

storage.backend=hbase
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.hostname=ip
storage.hbase.table=hadoop-test-3
storage.batch-loading=true
schema.default = none
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=elasticsearch
index.search.hostname=ip
index.search.index-name=hadoop_test_3

The following is hadoop-graphson.propertices:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=data/tinkerpop-modern.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

giraph.minWorkers=2
giraph.maxWorkers=2
giraph.useOutOfCoreGraph=true
giraph.useOutOfCoreMessages=true
mapred.map.child.java.opts=-Xmx1024m
mapred.reduce.child.java.opts=-Xmx1024m
giraph.numInputThreads=4
giraph.numComputeThreads=4
giraph.maxMessagesInMemory=100000

spark.master=local[*]
spark.serializer=org.apache.spark.serializer.KryoSerializer

the full stacktrace of this exception:

Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.IllegalStateException: Multiple properties exist for the provided key, use Vertex.properties(name)
    at org.apache.tinkerpop.gremlin.structure.Vertex$Exceptions.multiplePropertiesExistForProvidedKey(Vertex.java:179)
    at org.apache.tinkerpop.gremlin.structure.Vertex.property(Vertex.java:74)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.IncrementalBulkLoader.getOrCreateVertexProperty(IncrementalBulkLoader.java:85)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$2(BulkLoaderVertexProgram.java:216)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:216)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
    at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$4(SparkExecutor.java:118)
    at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)