apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.55k stars 1k forks source link

Could Lucene's default Directory (`FSDirectory.open`) somehow preload `.vec` files? #13551

Open mikemccand opened 1 month ago

mikemccand commented 1 month ago

Description

This is really a "discussion" issue. I'm not sure at all that the idea is feasible:

I've been testing luceneutil with heavy KNN indexing (Cohere wikipedia en 768 dimension vectors) and one dismal part of the experience is the swap storm caused by HNSW's random access to the raw vectors stored in the .vec files for each segment on "cold start" searching.

Even when the box has plenty of RAM to hold all .vec files, the swap storm takes minutes on the nightly benchy box, even with a fast NVMe SSD holding the index. Even if the index is freshly built, the OS doesn't seem to cache the .vec files since they appear to be "write once", until the searching starts up, and then the swap storm begins. This was with FLOAT32 vectors ... I suspect the problem is less severe with int4 or int8 compressed vectors (haven't tested).

At Amazon (customer facing product search) we also see this swap storm when cold starting the searching process even after NRT segment replication has just copied the files locally: they don't stay "hot" in the OS in that case either (looks like "write once" to the OS).

Lucene already has an awesome feature to "fix" this:MMapDirectory.setPreload. It will pre-touch all pages associated with that file on open, so the OS caches them "once" on startup, much more efficiently than the random access HNSW. But this only makes sense for applications/users that know they have enough RAM (we will test this at Amazon to see if it helps our cold start problems). For my luceneutil tests, simply cat /path/to/index/*.vec >> /dev/null (basically the same as .setPreload I think) fixed/sidestepped the swap storm.

(The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO is likely not helping either? Is this really a thing? I have not tested NIOFSDirectory to see if the swap storm is lessened).

Longer term we have discussed using AKNN algorithms that are more disk-friendly (e.g. DiskANN), but shorter t erm, I'm wondering if we could somehow help users with a default Directory (from FSDirectory.open) that somehow / sometimes preloads .vec files? It's not easy to do -- you wouldn't know up front that the application will do KNN searching at all. And, maybe only certain vectors in the .vec will ever be accessed and so you need to pay that long random access price to find and cache just those ones.

rmuir commented 1 month ago

It's not easy to do -- you wouldn't know up front that the application will do KNN searching at all. And, maybe only certain vectors in the .vec will ever be accessed and so you need to pay that long random access price to find and cache just those ones.

I'm trying to understand this sentence:

jpountz commented 1 month ago

It wouldn't solve the issue, only mitigate it, but hopefully cold start performance gets better when we start leveraging IndexInput#prefetch to load multiple vectors from disk concurrently (https://github.com/apache/lucene/issues/13179).

The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO

FWIW, it's not that the read-ahead is less agressive for mmap than traditional I/O, it's that since recently MMapDirectory explicitly tells the OS to not perform readahead for vectors by passing MADV_RANDOM to madvise.

mikemccand commented 1 month ago

Oh sorry I used the wrong term (thank you @rmuir for clarifying!): it's not a swap storm I'm seeing, it's a page storm. The OS has plenty of free ram (reported by free), and that goes down and buff/cache goes up as the OS pulls and caches pages in for the .vec file. I don't think I'm running too many ram hogging crapplications ;)

  • it should be a one-liner using setPreload to preload "*.vec" if we wanted to do it either from FSDirectory.open or by default in MMapDirectory

+1 -- it would be a simple change. But I worry if it would do more harm than good in some cases, e.g. if there are truly cold HNSW cases where the application plans to suffer through paging to identify the subset of hot vectors? I don't know if that is even a thing -- a bunch of dark matter vectors that never get visited? I guess we do know that HNSW can and does sometimes produce disconnected graphs, but I think those dark matter islands are "typically" smallish.

  • if the application isn't doing KNN searching then they won't have .vec? I struggle to imagine someone indexing a ton of "extra" vectors that isnt "using" them and hasn't noticed big performance impact

+1! Especially because indexing them is quite costly!

It wouldn't solve the issue, only mitigate it, but hopefully cold start performance gets better when we start leveraging IndexInput#prefetch to load multiple vectors from disk concurrently (#13179).

+1 -- that would be a big help especially when paging in from fast SSDs since these devices have high IO concurrency.

The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO

FWIW, it's not that the read-ahead is less agressive for mmap than traditional I/O, it's that since recently MMapDirectory explicitly tells the OS to not perform readahead for vectors by passing MADV_RANDOM to madvise.

Ahhh, that makes sense! And I think it is still correct to madvise in this way for our .vec files: it really likely (is there any locality to HNSW's access patterns?) is a random access pattern from HNSW. It does make me wonder if the scalar compression will actually help or not. I guess it might still help if that compression means multiple vectors fit into a single IO page.

uschindler commented 1 month ago

I don't think we should change anything here in MMapDirectory. It is all available and easy to do for one that wants to do this. Elasticserach is doing this for some files, but we have no default on preloading anything.

For preloading you can pass MMapDirectory#setPreload and give it a lamda expression to select on which file extensions you want preloading: https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/store/MMapDirectory.html#setPreload(java.util.function.BiPredicate)

The default is: https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/store/MMapDirectory.html#NO_FILES

uschindler commented 1 month ago
  • it should be a one-liner using setPreload to preload "*.vec" if we wanted to do it either from FSDirectory.open or by default in MMapDirectory

It is trivial:

mmapDir.setPreload((name,ctx) -> name.endsWith(".vec"));

So no change needed in the API or behaviour of Lucene.