Swirrl / ook

Structural search engine
https://search-prototype.gss-data.org.uk/
Eclipse Public License 1.0
6 stars 0 forks source link

Observation-graph loading should be transactional #120

Open Robsteranium opened 2 years ago

Robsteranium commented 2 years ago

115 introduces a graph-index (for #17) which is dropped/ re-inserted before the observation-pipeline runs. If the ETL process is interrupted while loading observations then it leaves the graph and observation indexes in an inconsistent state. If you try re-running the ETL process then nothing happens as it finds the graph-index to be up to date (even though the observation index is out of date and could include partially-loaded graphs).

To recover from this sort of interruption we need to manually delete the observation and graph indices before restarting, e.g.

curl -X DELETE http://localhost:9200/observation
curl -X DELETE http://localhost:9200/graph
sudo systemctl start etl

It'd be nice if it were a bit more transactional or at least didn't leave the indices in an inconsistent state after interruption - e.g. only update the graph index one doc at a time after all the observations are loaded for that graph. That way even if the observation-pipeline was interrupted mid-graph, it'd redo that graph on the next run (and retain any completed ones).