Open adrian-arapiles opened 2 years ago
Thanks for the report. I don't see how this can possibly be related to the use of the Java client, as the client will just send a standard bulk request in nd-json format.
Is this reproducible with different or even random document identifiers? Could this be an unfortunate coincidence because the id of the documents that are bulk-indexed with the Java client all end up in the same shard?
Hi, In that example all identifiers were consecutive numeric ids read from a auto-increment from database. I thought could be a coincidence but IMO with more than 100k of documents it's so hard that the same shard was chosen. And I tried to replicate this behavior from kibana console with same bulk request and same documents and the behavior was the expected, all working well. I'll try with last versions to check if maybe was a bug from elasticsearch or whatever. I'll write you here when I make the new test.
Hi,
I found a weird behavior on bulk request. When you have and index with for example 3 shards, all documents go to same shard. If you put to index with 6 shards, all documents go to 2 shards.
When I put a custom routing on bulk request, documents are mixed on all shards. I think is an issue/bug with routing on bulk requests but I don't know what it could be. I tried to reproduce without use client from kibana console but I can't reproduce the same behavior, so I think is client issue.
The code is:
And the elasticsearch _cat/shards/multimedia-phash output:
The code with workaround is:
And the elasticsearch _cat/shards/multimedia-phash output:
Versions: Elasticsearch: 8.0.0 co.elastic.clients.elasticsearch-java: 8.0.0
If you need any more info, please ask me.
Thanks in advance, Adrian.