Open jpountz opened 1 year ago
I want the additional type-safety of things like Field classes that users use, but I think at a high-level, Document/Field api is good and intuitive model for end users.
So I'm not sure about the value of introducing different apis or more separation. You can already avoid using Document if you don't want to use that.
I don't agree with all the methods on Document today that pretend to offer map-like access but are really linear time searches through a list though, I would support deprecating all those.
As far as what gets constructed by the "default" / "easy" stored fields visitor, i do think that one should be a Map and not conflated with Document used for indexing.
Honestly i think a big challenge is naming. Last time we tried this, the name was StoredDocument
but I think that name is confusing. Too bad we created a class named StoredFields
in #11998 as I almost think that would be perfect, for what you should "get back from the index".
I'm not sure about the value of introducing different apis or more separation.
The issue in my mind is that IndexableField
has something called storedValue
for indexing which is never populated when retrieving stored fields, and it also has a numericValue()
for retrieval that can be any number type, but than IndexingChain
always treats as a long.
I don't agree with all the methods on Document today that pretend to offer map-like access but are really linear time searches through a list though, I would support deprecating all those.
+1
yeah and just to clarify at high-level: Indexing Time:
Field.Store.YES
like people are accustomed to).retrieval time:
I feel, that this discussion is close to my comment https://github.com/apache/lucene/issues/10374#issuecomment-1666612060
Description
As @rmuir managed to make me look into reducing the amount of guessing we're doing in our document API, I think that a requirement for doing it right will be to split our index and store document APIs. Currently,
Document
andIndexableField
are trying to cover both, which creates a confusing API.For the store API, I wonder if we still need a Document abstraction. Maybe we could return a simple
List<StoredValue>
and push the work on users to convert it into a map if they want to be able to access fields by name. This is something that is easy to do with Java streams nowadays.For the index API, I'd like to model
IndexableField
according to how it's getting consumed byIndexingChain
. Something like:I haven't thought much about it but I'm curious to hear thoughts.