Could these two critical documents be added to? For example, the domain IDF-TF score calculation is shown, but it's not clear if that is part of the index file format or stored elsewhere. It appears as if each data record might be grouped by the domain which contains the IDF-TF score in the header, but that doesn't make sense because it's supposed to be a score for each term.
Likewise, it's also not clear if the 8 * n bytes keys represent the document terms which are indexed at each position in the data record. Based on the index header containing these int64 bit pointers to locations inside the data record, I assume there must be many index files as they look like they are immutable requiring knowledge of all data record contents when they are constructed.
The Index File Format and Search Result Ranking is not defined very clearly.
Could these two critical documents be added to? For example, the domain IDF-TF score calculation is shown, but it's not clear if that is part of the index file format or stored elsewhere. It appears as if each data record might be grouped by the domain which contains the IDF-TF score in the header, but that doesn't make sense because it's supposed to be a score for each term.
Likewise, it's also not clear if the
8 * n bytes keys
represent the document terms which are indexed at each position in the data record. Based on the index header containing these int64 bit pointers to locations inside the data record, I assume there must be many index files as they look like they are immutable requiring knowledge of all data record contents when they are constructed.