JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.31k stars 1.17k forks source link

What does ids.num-partitions do? #120

Open jfstephe opened 7 years ago

jfstephe commented 7 years ago

I know there's the documentation but it's unclear how it relates to ids.block-size. Can someone add more colour to the documentation please?

Better explanation of using ids.block-size to reserve blocks of ids by each instance could be better explained too. Maybe detail somewhere what Janus does at startup etc? I'm not sure I've see that anywhere (although could have missed it).

porunov commented 6 years ago

Same with me. I tried to figure out how the id allocation process is done but haven't found information about it. There are questions which are unclear till now:

  1. How does janusgraph allocate id blocks in eventual consistent databases (Cassandra / ScyllaDB)? I mean, is there any possibility that two different JanusGraph instances would allocate the same id block?
  2. What happens with the id block after JanusGraph instance is crashed? Is that block is gone with the instance crash or id block will be reassigned to the same instance after the instance is recovered? If it assign the same block after the instance is recovered than how it will know what is the last id which was used?
  3. How many blocks each JanusGraph instance allocates when we have several partitions (lets say 32)? As I understand each JanusGraph instance will allocate 32 id blocks (one for each partition) but I may be wrong.

I would appreciate seeing answers for any of those questions.

Best regards

YevIgn commented 6 years ago

Also it is currently not clear how cluster.max-partitions and ids.num-partitions are correlated and this topic is not covered by docs.

cjxqhhh commented 4 years ago

Have the same questions. When will the doc be updated? Thanks.

steve-todorov commented 4 years ago

Any updates on this? :)

pete-gillin-privitar commented 1 year ago

I'm not an expert, I'm just a random user who happened to go diving through the code, but... It looks like cluster.max-partitions controls the number of partitions in the graph, but SimpleBulkPlacementStrategy randomly picks ids.num-partitions of them and then randomly assigns vertexes to one of that subset. There's a page which gives advice on setting cluster.max-partitions, but I've seen no advice on how to pick ids.num-partitions.