Closed jpountz closed 5 months ago
I am not sure about this. We eagerly load the nodesByLevel
on heap for every field. This means we effectively load the upper-levels of graph on heap. While upper levels scale down in size logarithmically, they can still be thousands of nodes.
That's a good call out @benwtrent. But I was wondering, how much size is considered as big size for tracking purpose? For example, let's say there are 1 million nodes in upper levels and if we are creating array of 1 million length, total memory will be 4 MB per million nodes (4 bytes * 1 MM). So is it fine or not fine to not track this size?
It's true that these things can still use amounts of heap that are not completely negligible, but I wonder if it's a good reason to preserve this interface. Doc values too can use non-negligible amounts of heap if you have many fields that are multi-valued and/or use dictionary compression.
From a quick look, none of Elasticsearch, Solr, OpenSearch and luceneserver take advantage of this interface on KnnVectorsReader
. (I searched for calls to CodecReader#getVectorReader
, which are necessary to get an object that you can call #ramBytesUsed()
on.)
@benwtrent Do you have a strong opinion on keeping the Accountable
interface? I remember keeping it in 9.0 because vectors were new and wanting to be careful, but I'm keen on cleaning things up now and aligning vectors with other file formats that don't implement this interface.
@jpountz I do not have a strong opinion :). Its cool to remove it.
Description
In Lucene 9.0, we removed the
Accountable
interface on our index readers and file formats, because heap usage had become so low that this metric had become less relevant, and also much less accurate as there were no longer a couple things that would account for a majority of the RAM usage, but rather many small objects that don't account for much memory on their own and that were too hard to estimate properly.We still kept
Accountable
onKnnVectorsReader
because it used to load lots of things in heap at the time. Since vectors have become much more reasonable since then, I would suggest now removingAccountable
onKnnVectorsReader
like we did for other file formats in 9.0.