Open wojciechwojcik opened 7 years ago
I think there are a few options to explore for reindexing:
Or we could even provide couple of the above options, by adding new enum values to SchemaAction (also leaving current one for backwards compatibility) e.g:
SchemaAction.REINDEX, SchemaAction.REINDEX_DROP, SchemaAction.REINDEX_SYNC
etc.
According to: http://docs.janusgraph.org/0.1.0/indexes.html
When you do:
You'd expect that index data is fully representative of what is currently stored in the graph database. Still the database scanning works only one way: all existing vertices from storage backend are re-added to index backend.
Sometimes when vertex deletion does not get propagated to the index (e.g see #329), index can contain vertices that are no longer in the graph.
In causes the following issues: 1) Queries that use the index will still return such deleted vertices with their ids without performing any checks or logging any errors 2) Reindexing action does not fix this issue
The only workaround is to drop/clear the index manually before re-indexing. This is often time consuming and leaves index non-operational until re-indexing is completely finished.
Reindexing action could be improved to:
Quick and dirty way to re-create: Create mixed index, fill-in some data, drop cassandra storage backend (or some part of if) to simulate failure, restart janus and run some queries using mixed index.