Open aduchate opened 7 hours ago
Thanks for reaching out, Antoine.
Do you see any crashes or errors in the logs?
QuickJS can build indexes a bit faster but that could mean putting more pressure on the disk or other resources. Do you see increased cpu, disk(I/O quota reaching a limit?) or memory usage during that time?
If your design documents or documents are large sometimes it's possible to hit the maximum stack size in the JS engine. For QuickJS that can be adjusted with memory_limit_bytes. Though in that case I'd expect to see repeated crashes in the logs with segfault or a memory limit error.
Before killing the process try to gather some stats if you an remsh in:
erlang:process_info(Pid).
recon:proc_window(reductions, 3, 5000).
recon:proc_window(message_queue_len, 3, 5000).
recon:proc_window(memory, 3, 5000).
Another option if you see you have reached the os_process_limit on any node with the number of couchjs processes, could consider increasing that a bit.
Description
On a cluster (3.4.2) of 6 nodes that has a fairly large amount of databases (~1500 of size between 1GB and 150GB each), we have recently added and modified about 20 design docs per database (total 30000 dds). We have setup Ken with a concurrency of 5 to let the indexation happen.
About every 10 minutes, we see one indexer process not being updated anymore. It basically stays stuck forever (we let a few linger for 24 hours). Killing all couchjs_mainjs has no impact on the stuck indexer. The only way to get rid of it is to issue, in remsh, an exit(Pid, kill). . Pid here is the pid field of /_active_task, not indexer_pid.
Steps to Reproduce
Create a lot of databases with a lot of data, create a few design documents per database, start ken.
Expected Behaviour
The indexers shouldn't get stuck
Your Environment
Additional Context
We can give you access to the infrastructure that causes the problem to happen if needed.