apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.67k stars 1.04k forks source link

Learned sorting algorithm for Lucene #12463

Open josefschiefer27 opened 1 year ago

josefschiefer27 commented 1 year ago

Description

I found this article about an interesting SIGMOD paper about a learned sorting algorithm which outperforms Radixsort by the factor 1.49. While the implementation doesn't look trivial, it might be an opportunity to significantly speed up the sorting in Lucene.

edit: There is an updated version (LearnedSort 2.0) which can deal better with duplicates.

cc: @jpountz @bruno-roustant

jpountz commented 1 year ago

This is certainly interesting and possibly applicable to Lucene as indexing involves a lot of sorting, but also looks complicated to integrate. Contributions are welcome. :)

msokolov commented 1 year ago

https://github.com/anikristo/LearnedSort/ is GPLv3 licensed

bruno-roustant commented 1 year ago

Thank you for creating this issue. This is indeed a subject I'm interested in, although currently I'm on another learned algorithm to search sorted keys. This learned sort would be the next step! Did you start some experimental implementation in Java?

Frostfire25 commented 1 year ago

Hey, very interested in assisting with the implementation of this algorithm.

josefschiefer27 commented 1 year ago

Here a link to the actual SIGMOD paper.