elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.59k stars 24.36k forks source link

Optimize BytesArray#indexOf? #94547

Open jpountz opened 1 year ago

jpountz commented 1 year ago

Description

BytesArray#indexOf is used to split multi-search and bulk indexing requests around line breaks. It currently does a naive loop that iterates over bytes until it finds the expected one. It may be possible to do slightly better by using VarHandles to read 8 bytes at once into a long, and then using a bit twiddling hack to check if this word has any line break in it.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

thecoop commented 1 year ago

Rather than bit twiddles, it's worth exploring using Vector more generally, still in incubation as of JDK 20

jpountz commented 1 year ago

Agreed, the vector API would be a much better solution.

chengzhou9 commented 1 year ago

@jpountz Hi, is anyone currently working on this? I'd like to try and pick this up as a first time contributor.

thecoop commented 1 year ago

@chengzhou9 Vector is currently in incubator, which means it is not suitable for including in production ES code at the moment. Also the package name is going to change. However, feel free to test out some prototypes & check the performance using rally!

AkisPanagiotopoulos commented 1 year ago

Hello, is anyone working on this issue at at the moment? I am a first time contributor and i would like to work on this. If anyone has worked on this and needs help or can give me some additional info of progress on this issue, feel free to do so! :)

Toavina23 commented 1 year ago

I have actually done an implementation that passes all tests except the MemorySizeSettingsTests/testCircuitBreakerSettings, and I am actually a bit confused about how this is related to my changes (I am pretty new to the codebase so please don't roast me for that questionning)

Toavina23 commented 1 year ago

So the test did not pass even with the old implementation of it so it is clearly not related to my changes, I would like to submit a PR for reviewing on this

alemendoza-v commented 1 year ago

Has this issue been solved?

XxBrezxX commented 1 year ago

Is this issue solved?

jpountz commented 1 year ago

Not that I know of. However, since this issue was opened, @ChrisHegarty introduced support for taking advantage of the Panama vector API in Lucene, so vectorizing byte comparisons thanks to the vector API now looks like a lower hanging fruit and a much better option than my initial proposal. I removed the good first issue label, as it's likely something we'd rather take internally given the connection with the build.