alexandria-org / alexandria

Full text search engine powering Alexandria.org - the open search engine.
https://alexandria.org
Other
188 stars 8 forks source link

Update Ranking and Index Documentation #19

Open xeoncross opened 2 years ago

xeoncross commented 2 years ago

The Index File Format and Search Result Ranking is not defined very clearly.

Could these two critical documents be added to? For example, the domain IDF-TF score calculation is shown, but it's not clear if that is part of the index file format or stored elsewhere. It appears as if each data record might be grouped by the domain which contains the IDF-TF score in the header, but that doesn't make sense because it's supposed to be a score for each term.

Likewise, it's also not clear if the 8 * n bytes keys represent the document terms which are indexed at each position in the data record. Based on the index header containing these int64 bit pointers to locations inside the data record, I assume there must be many index files as they look like they are immutable requiring knowledge of all data record contents when they are constructed.

joscul commented 2 years ago

Yes, the documentation for the index file format and result ranking is not accurate. I will go through them.