alan-turing-institute / reginald

Reginald repository for REG Hack Week 23
3 stars 0 forks source link

nb for updating index #61

Closed lannelin closed 1 year ago

lannelin commented 1 year ago

Assumes that we want to update an existing index (but are unable to isolate the new documents up-front)

To do this we:

Load the existing index
Load the new data file (which includes data we've already indexed as well as some new docs :( )
Work out overlap and isolate new documents
Add new documents to index
Save new index

TODO and not covered by PR

extend to identify data to delete (outdated pages)
auto identify CHUNK_SIZE_LIMIT if possible
review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB