apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.28k stars 1.04k forks source link

Ken started Quickjs backed indexer processes freeze #5342

Open aduchate opened 7 hours ago

aduchate commented 7 hours ago

Description

On a cluster (3.4.2) of 6 nodes that has a fairly large amount of databases (~1500 of size between 1GB and 150GB each), we have recently added and modified about 20 design docs per database (total 30000 dds). We have setup Ken with a concurrency of 5 to let the indexation happen.

About every 10 minutes, we see one indexer process not being updated anymore. It basically stays stuck forever (we let a few linger for 24 hours). Killing all couchjs_mainjs has no impact on the stuck indexer. The only way to get rid of it is to issue, in remsh, an exit(Pid, kill). . Pid here is the pid field of /_active_task, not indexer_pid.

Steps to Reproduce

Create a lot of databases with a lot of data, create a few design documents per database, start ken.

Expected Behaviour

The indexers shouldn't get stuck

Your Environment

FROM ubuntu:22.04

# Create app directory
WORKDIR /root

# Install dependencies
RUN apt-get update
RUN ln -fs /usr/share/zoneinfo/Europe/Brussels /etc/localtime
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends tzdata
RUN dpkg-reconfigure --frontend noninteractive tzdata
RUN apt-get install -y gnupg2 wget curl vim git build-essential pkg-config libicu-dev libmozjs-78-dev libcurl4-openssl-dev libncurses-dev node-gyp npm libssl-dev help2man openjdk-21-jdk-headless
RUN git clone https://github.com/erlang/otp otp_src_27.1.2
WORKDIR /root/otp_src_27.1.2
RUN git checkout -b 27.1.2 44ffe8811dfcf3d2fe04d530c6e8fac5ca384e02
RUN bash -c 'export ERL_TOP=`pwd`; export LANG=C; ./configure; make; make release_tests; cd release/tests/test_server; /root/otp_src_27.1.2/bin/erl -s ts install -s ts smoke_test batch -s init stop; cd ..; tar zcvf otp-tests.tgz test_server; cd /root/otp_src_27.1.2; make install'
WORKDIR /root
RUN git clone https://github.com/apache/couchdb.git #9
WORKDIR /root/couchdb
RUN git checkout -b 3.4.2 6e5ad2a5c5479cb09722b4a7d13b3d59b7bb2a23
RUN bash -c 'curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash ; . /root/.nvm/nvm.sh; nvm install 18'
RUN ./configure  --disable-docs --spidermonkey-version 78 chdir=/media/data/src/couchdb
RUN bash -c '. /root/.nvm/nvm.sh; nvm use 18; make release'
WORKDIR /root/couchdb/rel
RUN tar zcvf couchdb.jammy-jellyfish.3.4.2.tgz couchdb

Additional Context

We can give you access to the infrastructure that causes the problem to happen if needed.

nickva commented 6 hours ago

Thanks for reaching out, Antoine.