Ipv6 parser speed - Githubissues

elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

https://www.elastic.co/products/elasticsearch

Other

1k stars 24.82k forks source link

Ipv6 parser speed #97118

Open GrandFisher opened 1 year ago

GrandFisher commented 1 year ago

Description

similar to speed up ip v4 parser,We found the performance overhead of String#split in ipv6 parse was too large.

We try to reduce redundant call in this pr Avoid IpStringToBytes Called Twice#97115

When we try to optimize String#split in ipv6 parse, we tried several existing code implementations. However, the best implementation has a GPLv2 license. Although we have made changes to suit es , we still think that contributing to the community will violate the relevant license and cause unnecessary troubles. Can you give some suggestions?

DaveCTurner commented 1 year ago

I think that's right, GPLv2 is incompatible with the licenses we use.

Could you give a little more context? Why is this causing you problems? You've only shown the last few calls in your image, but really we need to see the whole stack.

GrandFisher commented 1 year ago

I think that's right, GPLv2 is incompatible with the licenses we use.

Could you give a little more context? Why is this causing you problems? You've only shown the last few calls in your image, but really we need to see the whole stack.

When we replace machines that node running on, we need exclude nodes first, this will last for a while . So, if there has any other metadata change task triggered, moveShards overhead will be large due to FilterAllocationDecider as shown in the flame graph. In our scenario, there will be a large number of metadata change operations in a short period of time (create or delete index by business side and so on). Some task in Pendingtasks will wait too long to timeout(external setting).

DaveCTurner commented 1 year ago

Thanks, that's helpful. Rather than trying to micro-optimise matchByIp I wonder if we can introduce some caching to avoid recomputing these filters each time.

Note also that from 8.6 onwards most of this computation no longer happens on the master service thread.

DaveCTurner commented 1 year ago

Alternatively, could you impose node filters by name or node ID rather than by IP address?

GrandFisher commented 1 year ago

Alternatively, could you impose node filters by name or node ID rather than by IP address?

It's seem hard for us, more than one es instance running on one node. Filter by name or id will need SRE manually find one by one, it increase labor costs.

GrandFisher commented 1 year ago

Thanks, that's helpful. Rather than trying to micro-optimise matchByIp I wonder if we can introduce some caching to avoid recomputing these filters each time.

Note also that from 8.6 onwards most of this computation no longer happens on the master service thread.

We have another solution: es resolves the ip when updating the clustersetting, and directly uses the result for filtering.

DaveCTurner commented 1 year ago

Rather than trying to micro-optimise matchByIp

Having said that, this code does look really rather inefficient to me. InetAddresses#getIpOrHost is basically normalising the IP address, but it does a lot of allocations and copying along the way. I'm sure there's a way to do this without so many allocations, but I wonder if we could cheaply detect that the address is already properly normalised up-front and just skip all the work.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-distributed (Team:Distributed)