Assumes that we want to update an existing index (but are unable to isolate the new documents up-front)
To do this we:
Load the existing index
Load the new data file (which includes data we've already indexed as well as some new docs :( )
Work out overlap and isolate new documents
Add new documents to index
Save new index
TODO and not covered by PR
extend to identify data to delete (outdated pages)
auto identify CHUNK_SIZE_LIMIT if possible
Assumes that we want to update an existing index (but are unable to isolate the new documents up-front)
To do this we:
TODO and not covered by PR