Closed Stefan4472 closed 2 years ago
Ideal behavior: upon index_file()
or index_string()
, user provides an overwrite
flag, which overwrites the existing document if it is already in the database. This, however, requires a mechanism by which we can remove or modify the Inverted Indexes. That's an issue for a later day.
For now the workaround will probably be to simply not re-index the file if it is already in the index.
*For now the workaround will probably be for the caller to simply not re-index the file if it is already in the index.
Also: see Python's bisect
module, which we can use to better implement searching for doc_id
in an InvertedList: https://www.tutorialspoint.com/python-inserting-item-in-sorted-list-maintaining-order
The probable next step will be to support removing a document from the index. On indexing a duplicate, if override=True
, the SearchEngine can delete the file from the index, then re-add it.
Opened #21 to address this
And created #22 with the bisect idea
Something I noticed when doing migrations on Stefans-Blog: when I re-index a file under the same slug, it appears that I get duplicated search results. Adding the same slug a second time should overwrite the existing document.