Open ersin-erdal opened 3 days ago
Haven't reviewed the code yet, but I did take it for a spin.
Notes:
if the write block is still on and Kibana is restarted, messages like this are logged: Task ML:saved-objects-sync-task: Error running task: ML:saved-objects-sync-task, index [.kibana_task_manager_9.0.0_001] blocked by: [FORBIDDEN/8/index write (api)];: cluster_block_exception
Guessing this is probably ok, but why would we be trying to write a task, that presumably already exists? Is that the way "ensureScheduled" (or whatever) works w/TM? Not clear if it's all the tasks or just some. Not sure it's worth doing anything about this, if anything it's a great signal that the TM index is write-blocked :-)
when using the update-by-query claimer, there's a long, filled-with-JSON error logged every 3s: Failed to poll for work: { big JSON wad here }
. Seems like we should try to not log that every 3s, but perhaps the # of folks using that claimer, by the time we're in version 8.last, will be almost or literally none.
Other than that, seems to work as described. Looks like it's logging the Discovery service message ~1/minute, and then you can see errors updating task claims, etc, as expected. When the block is removed, everything comes back to normal.
Resolves: https://github.com/elastic/response-ops-team/issues/249
This PR increases task claiming interval in case of
cluster_block_exception
to avoid generating too many error during TM index reindexing.To verify:
kibana_system
andkibana_admin
rolesPUT /.kibana_task_manager_9.0.0_001/_block/write