apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Simplify StoredFieldsVisitor [LUCENE-5870] #6932

Open asfimport opened 10 years ago

asfimport commented 10 years ago

StoredFieldVisitor has a visitor method for 4 numeric types: int, long, float and double. We should remove this specialization and just have a method that takes a java.lang.Number.


Migrated from LUCENE-5870 by Adrien Grand (@jpountz), updated May 09 2016 Attachments: LUCENE-5870.patch

asfimport commented 10 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Here is a patch.

asfimport commented 10 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

To give more context, a consequence of this change is that stored fields could store both ints and longs using a zlong without having to record whether it was an int or a long.

asfimport commented 10 years ago

Ryan Ernst (@rjernst) (migrated from JIRA)

+1

asfimport commented 10 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

+1

In my opinion, I would use methods of Double/Float/... that directly return an instance, like Double.valueOf() instead of autoboxing Double.parseDouble() and so on.

In 4.x we may still need some special case, because we have a backwards layer for early 3.x indexes there (like 3.2 or so).

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I am a little concerned about this, since it results in loss of information.

its similar to the parallel of removing TOKENIZED bit from stored fields before but yet still keeping StringField. This caused a lot of confusion for users.

Today, the Analyzer doesn't know have "full picture" because of StringField/IntField/FloatField and company "bypassing it". This causes a lot of pain, for example, you cannot even do a simple numeric range query with lucene without subclassing things with your "own additional schema".

In my opinion this stuff makes lucene too hard to use, because its too hard to reconstruct the doc from stored fields to e.g. perform an update to it and pass it back to indexwriter. Instead it tries to force people to either write/maintain a separate schema and subclass many things or force them to use some server that does this, which should not be necessary.

An alternative would be, if we removed StringField/IntField/LongField etc and these were instead just KeywordAnalyzer/IntAnalyzer whatever in the analysis chain, then queryparser could form range queries without subclassing, queries on string fields would just work, and the "schema" needed to search would be implicit all in one place (the users Analyzer), making lucene a lot easier to use.