apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.73k stars 1.05k forks source link

Ability to turn off the store for an index [LUCENE-2025] #3100

Open asfimport opened 15 years ago

asfimport commented 15 years ago

It would be really good in combination with parallel indexing if the Lucene store could be turned off entirely for an index.

The reason is that part of the store is the FieldIndex (.fdx file), which contains an 8 bytes pointer for each document in a segment, even if a document does not contain any stored fields.

With parallel indexing we will want to rewrite certain parallel indexes to update them, and if such an update affects only a small number of documents it will be a waste if you have to write the .fdx file every time.

So in the case where you only want to update a data structure in the inverted index it makes sense to separate your index into multiple parallel indexes, where the ones you want to update don't contain any stored fields.

It'd be also great to not only allow turning off the store but to make it customizable, similarly to what flexible indexing wants to achieve regarding the inverted index.

As a start I'd be happy with the ability to simply turn off the store and to add more flexibility later.


Migrated from LUCENE-2025 by Michael Busch, 1 vote, updated May 09 2016 Linked issues:

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Simon, watch out for INFRA-3517 – we have to be careful, when labeling, to not use the label with a trailing comma stuck on!

Ie this issue now has two such labels: 'gosc2011,' and 'mentor,'

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Ie this issue now has two such labels: 'gosc2011,' and 'mentor,'

thanks mike I changed them back to have no commas

asfimport commented 12 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

moving this over to 4.1 this won't happen in 4.0 anymore

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

One simple way to do this today is to just use a codec that has a NoStoredFieldsImpl, Throws exception in its writer impl if you ask it to actually write any stored fields (e.g. startDocument(n) is called where n > 0), and does nothing in its reader impl.

I think for the typical case its fairly uncommon, i looked into seeing if we could optimize this case for Lucene40's impl, but it introduces a lot of scary situations for things like bulk merge.

So for now I really think this is a simple safe way at the moment, if someone wants to turn it off they just set this as their codec on indexwriter.

asfimport commented 11 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Bulk move 4.4 issues to 4.5 and 5.0

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Move issue to Lucene 4.9.