fergiemcdowall / search-index

A persistent, network resilient, full text search library for the browser and Node.js
MIT License
1.39k stars 149 forks source link

Optimizing the document format for internal field management. #576

Closed jakobsa closed 2 years ago

jakobsa commented 2 years ago

Thank you for your work and this useful library.

After reading though the Docs, FAQs and API I have an open question regarding the documents that are indexed.

When looking at the runtime behavior of the lib as a blackbox I asked myself: How do query results differ for arrays and objects when being indexed.

Reading into the code my conclusions are:

As of now my documents follow a JSON-Schema that defines at several levels objects with arbitrary property names. These arbitrary names are unique across hundreds of documents and are relevant search terms themselves. That leads me to assume that:

Without converting my documents into another format the resulting search index would not be very useful. As thousands of fields would be created each in turn having only one result candidate. And the field names would not be processed as search terms.

It would be great if you could correct me if any of those assumptions are false and I have overlooked or misunderstood something about the API.

fergiemcdowall commented 2 years ago

Yes, correct on all 3 assumptions 🙂👍

In order to make field names searchable, you would have to write a custom tokenisation stage, or manipulate the objects before indexing