Closed robertdale closed 1 year ago
+1 on this.
I have a consistent way of defining the unique identifier property of any vertex or edge by simply adding '_id' to it's label.
About a week or so ago, my caffeine supply was insufficient and I ended up accidentally creating a property 'emails_id' with data type integer instead of string.
I realised I had 2 options;
1.) Forget about that property key and any associated indexes and create a new property eg. 'email_id' thus breaking my pattern.
2.) Drop the entire graph, schema, et all, create a new correct schema and repopulate.
I went with option 2, which was a non-trivial undertaking for a few million elements on commodity hardware.
In short, I think this is a great suggestion by @robertdale and I'd even go on as to suggest it extends beyond indexes to perhaps unused but defined properties.
Hi, @robertdale. Do I need to reindex global index after I import data from one cassandra cluster to another cassandra cluster? I got an error now if I don't reindex.
This issue is rather old, but considering that JanusGraph's index management capabilities fall short under quite a few other circumstances (see #3046 and #2500 for example), I would like to use this one as an entry point to revise the index lifecycle. The main shortcomings I noticed are:
In the current index lifecycle, I identified two key points which cause the issues above:
DISABLED
state resembles more of a PAUSED
state, because all data is still present. From this state, ENABLE_INDEX
and REINDEX
should work just the same as in the REGISTERED
state.REMOVE_INDEX
action resembles more of a CLEAR_INDEX
action. It removes all indexed data but leaves the index technically intact. Therefore, it would make more sense to rename this action and have it lead to the state REGISTERED
.If we fix these two remarks, the index lifecycle could look like this, with added or updated schema actions marked in blue.
I renamed REMOVE_INDEX
to CLEAR_INDEX
here and introduced a new REMOVE_INDEX
action which actually deletes the indexes schema vertex. This action is only available in the REGISTERED
state, i.e. for an index which is inactive holds no indexed elements. I am aware that JanusGraph is currently not capable of clearing mixed indexes.
This proposal introduces breaking changes, but considering the impact, I think it would be worth implementing.
This issue is rather old, but considering that JanusGraph's index management capabilities fall short under quite a few other circumstances (see #3046 and #2500 for example), I would like to use this one as an entry point to revise the index lifecycle. The main shortcomings I noticed are:
- Once disabled, an index can not be re-enabled.
- An index can not be entirely removed, as at least its schema vertex always remains.
- Due to 1. and 2., index names are used up once assigned and can not be reused.
In the current index lifecycle, I identified two key points which cause the issues above:
- The
DISABLED
state resembles more of aPAUSED
state, because all data is still present. From this state,ENABLE_INDEX
andREINDEX
should work just the same as in theREGISTERED
state.- The
REMOVE_INDEX
action resembles more of aCLEAR_INDEX
action. It removes all indexed data but leaves the index technically intact. Therefore, it would make more sense to rename this action and have it lead to the stateREGISTERED
.If we fix these two remarks, the index lifecycle could look like this, with added or updated schema actions marked in blue.
I renamed
REMOVE_INDEX
toCLEAR_INDEX
here and introduced a newREMOVE_INDEX
action which actually deletes the indexes schema vertex. This action is only available in theREGISTERED
state, i.e. for an index which is inactive holds no indexed elements. I am aware that JanusGraph is currently not capable of clearing mixed indexes.This proposal introduces breaking changes, but considering the impact, I think it would be worth implementing.
I like the proposed changes. Also I don't fully understand the reason behind not having an ability to remove a mixed index because it's just a small API call. For example in ElasticSearch a simple API call will clear and remove the index: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
So, it's not fully clear why we can't simply remove data in ES via an API call and then remove schema vertex of the associated index. Maybe I'm missing something but it feels that from DISABLED state we can simply remove a mixed index completely via a single API call.
I took a quick look at it yesterday and found out there even is a clearStorage
function in our index backend interface. It looks to me like every index implements it, except Solr which only supports clearStorage
for Solr Cloud.
I took a quick look at it yesterday and found out there even is a
clearStorage
function in our index backend interface. It looks to me like every index implements it, except Solr which only supportsclearStorage
for Solr Cloud.
I haven't looked at the implementation yet but I guess we could implement mixed index removal using their APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
https://solr.apache.org/guide/8_11/collection-management.html#delete
Probably Lucene should have remove ability as well (haven't looked yet). Nevertheless if some index backend doesn't support removal operation we always can throw an exception for such index backends.
I would prefer having an ability to remove mixed indexes completely (i.e. data from the index backend and a schema vertex for that index), so that this name could be reused later for new indexes.
Currently, disabling an index is only a precursor to removing it. However, a user may want to turn on/off indexes for various reasons - performance testing, maintenance, etc.
While an index could be removed and recreated, the docs imply that it can not have the same name. The docs state
Index removal deletes everything associated with the index except its schema definition and its DISABLED state. This schema stub for the index remains even after deletion
. This is not a good alternative.There should be a way to re-enable a disabled index and alternatively, remove every last remnant of the index so that it could be recreated with the same name.