JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.3k stars 1.17k forks source link

Re-enable a disabled index #857

Closed robertdale closed 1 year ago

robertdale commented 6 years ago

Currently, disabling an index is only a precursor to removing it. However, a user may want to turn on/off indexes for various reasons - performance testing, maintenance, etc.

While an index could be removed and recreated, the docs imply that it can not have the same name. The docs state Index removal deletes everything associated with the index except its schema definition and its DISABLED state. This schema stub for the index remains even after deletion. This is not a good alternative.

There should be a way to re-enable a disabled index and alternatively, remove every last remnant of the index so that it could be recreated with the same name.

The-Don-Himself commented 6 years ago

+1 on this.

I have a consistent way of defining the unique identifier property of any vertex or edge by simply adding '_id' to it's label.

About a week or so ago, my caffeine supply was insufficient and I ended up accidentally creating a property 'emails_id' with data type integer instead of string.

I realised I had 2 options;

1.) Forget about that property key and any associated indexes and create a new property eg. 'email_id' thus breaking my pattern.

2.) Drop the entire graph, schema, et all, create a new correct schema and repopulate.

I went with option 2, which was a non-trivial undertaking for a few million elements on commodity hardware.

In short, I think this is a great suggestion by @robertdale and I'd even go on as to suggest it extends beyond indexes to perhaps unused but defined properties.

CJSoldier commented 6 years ago

Hi, @robertdale. Do I need to reindex global index after I import data from one cassandra cluster to another cassandra cluster? I got an error now if I don't reindex.

rngcntr commented 1 year ago

This issue is rather old, but considering that JanusGraph's index management capabilities fall short under quite a few other circumstances (see #3046 and #2500 for example), I would like to use this one as an entry point to revise the index lifecycle. The main shortcomings I noticed are:

  1. Once disabled, an index can not be re-enabled.
  2. An index can not be entirely removed, as at least its schema vertex always remains.
  3. Due to 1. and 2., index names are used up once assigned and can not be reused.

old lifecycle

In the current index lifecycle, I identified two key points which cause the issues above:

  1. The DISABLED state resembles more of a PAUSED state, because all data is still present. From this state, ENABLE_INDEX and REINDEX should work just the same as in the REGISTERED state.
  2. The REMOVE_INDEX action resembles more of a CLEAR_INDEX action. It removes all indexed data but leaves the index technically intact. Therefore, it would make more sense to rename this action and have it lead to the state REGISTERED.

If we fix these two remarks, the index lifecycle could look like this, with added or updated schema actions marked in blue.

new lifecycle

I renamed REMOVE_INDEX to CLEAR_INDEX here and introduced a new REMOVE_INDEX action which actually deletes the indexes schema vertex. This action is only available in the REGISTERED state, i.e. for an index which is inactive holds no indexed elements. I am aware that JanusGraph is currently not capable of clearing mixed indexes.

This proposal introduces breaking changes, but considering the impact, I think it would be worth implementing.

porunov commented 1 year ago

This issue is rather old, but considering that JanusGraph's index management capabilities fall short under quite a few other circumstances (see #3046 and #2500 for example), I would like to use this one as an entry point to revise the index lifecycle. The main shortcomings I noticed are:

  1. Once disabled, an index can not be re-enabled.
  2. An index can not be entirely removed, as at least its schema vertex always remains.
  3. Due to 1. and 2., index names are used up once assigned and can not be reused.

old lifecycle

In the current index lifecycle, I identified two key points which cause the issues above:

  1. The DISABLED state resembles more of a PAUSED state, because all data is still present. From this state, ENABLE_INDEX and REINDEX should work just the same as in the REGISTERED state.
  2. The REMOVE_INDEX action resembles more of a CLEAR_INDEX action. It removes all indexed data but leaves the index technically intact. Therefore, it would make more sense to rename this action and have it lead to the state REGISTERED.

If we fix these two remarks, the index lifecycle could look like this, with added or updated schema actions marked in blue.

new lifecycle

I renamed REMOVE_INDEX to CLEAR_INDEX here and introduced a new REMOVE_INDEX action which actually deletes the indexes schema vertex. This action is only available in the REGISTERED state, i.e. for an index which is inactive holds no indexed elements. I am aware that JanusGraph is currently not capable of clearing mixed indexes.

This proposal introduces breaking changes, but considering the impact, I think it would be worth implementing.

I like the proposed changes. Also I don't fully understand the reason behind not having an ability to remove a mixed index because it's just a small API call. For example in ElasticSearch a simple API call will clear and remove the index: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html

So, it's not fully clear why we can't simply remove data in ES via an API call and then remove schema vertex of the associated index. Maybe I'm missing something but it feels that from DISABLED state we can simply remove a mixed index completely via a single API call.

rngcntr commented 1 year ago

I took a quick look at it yesterday and found out there even is a clearStorage function in our index backend interface. It looks to me like every index implements it, except Solr which only supports clearStorage for Solr Cloud.

porunov commented 1 year ago

I took a quick look at it yesterday and found out there even is a clearStorage function in our index backend interface. It looks to me like every index implements it, except Solr which only supports clearStorage for Solr Cloud.

I haven't looked at the implementation yet but I guess we could implement mixed index removal using their APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html

https://solr.apache.org/guide/8_11/collection-management.html#delete

Probably Lucene should have remove ability as well (haven't looked yet). Nevertheless if some index backend doesn't support removal operation we always can throw an exception for such index backends.

I would prefer having an ability to remove mixed indexes completely (i.e. data from the index backend and a schema vertex for that index), so that this name could be reused later for new indexes.