gentics / mesh-incubator

Project which is home for planned enhancements for Gentics Mesh
3 stars 0 forks source link

Bucket Sync Optimization #249

Open Jotschi opened 4 years ago

Jotschi commented 4 years ago

The bucket sync could be optimized.

Current

A bucket sync via the NodeIndexHandler will scan the graph elements multiple times (for each language, branch). This results in multiple bucket sync runs for the same handler:

16:35:09.793 [] INFO  [OkHttp http://elasticsearch-master:9200/...] [c.g.m.s.i.n.NodeIndexHandler] - Handling sync of {Bucket [0/1132] for elements [0/1897069]}

This happens since the diffAndSync invokes the bucketSync multiple times for different language / branch settings.

Desired

We could potentially improve the situation by using a graph index which only returns elements for a specific language branch combination. This would return a smaller total element count and thus reduce the amount of generated buckets.

Example: At the end this may result in a single bucket to be generated for lang fr (since lang fr may only contain one node). And many buckets for lang en (since most of the contents have been entered in en.

In combination with the index-per-language feature this would also reduce the load on ES since less indices would need to be searched to determine the delta.