apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

CouchDB instance without search node crashes when calling a search request #3603

Closed mojito317 closed 1 week ago

mojito317 commented 3 years ago

Description

CouchDB crashes if I bombard it with search requests when search node does not exist. See this gist file to check the logs.

Steps to Reproduce

I could only reproduce the issue when I ran unit tests.

  1. Start a plain CouchDB instance that does not contain the _search functionality. I used the latest docker image.
  2. Start the CloudantDatabaseTests in tests/unit/database.tests.py file from the cloudant/python-cloudant repo. I tried to skip as many tests as possible, so if you only keep these tests, you probably can reproduce the crash:

Expected Behaviour

CouchDB should not crash.

Your Environment

Additional Context

You have to set the following env vars when starting the nosetest:

DB_USER={uname};DB_PASSWORD={pwd};DB_URL=http://127.0.0.1:5984;RUN_CLOUDANT_TESTS=false
eiri commented 3 years ago

All right, so the strange part is that I can reproduce this on couchdb:3 docker image with the following ddoc, but can't reproduce when I'm building from repo's 3.x branch (nor from tag 3.1.1) and changing dev/run's .ini configs to match docker's.

File: alpha.json

{
    "indexes": {
        "searchindex001": {
            "index": "function(doc) { index(\"default\", doc._id); }"
        }
    }
}

Against docker:

$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X PUT
{"ok":true}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X POST -d '{"name": "Alice", "number": 42}'
{"ok":true,"id":"d9d564521ee139ec83134600f4000e0f","rev":"1-7b54ca479c9043f8cc4cd83777ce6b75"}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha -X PUT --data @alpha.json
{"ok":true,"id":"_design/alpha","rev":"1-b445a33cf17da10d2f2502f68f58462b"}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha/_search/searchindex001 -X POST -d '{"query": "name:Alice*"}'
{"error":"{badarg,[{erlang,monitor,[process,{main,'clouseau@127.0.0.1'}],[]},\n         {ioq,submit_request,2,[{file,\"src/ioq.erl\"},{line,187}]},\n         {ioq,maybe_submit_request,1,[{file,\"src/ioq.erl\"},{line,150}]},\n         {ioq,handle_info,2,[{file,\"src/ioq.erl\"},{line,123}]},\n         {gen_server,try_dispatch,4,[{file,\"gen_server.erl\"},{line,616}]},\n         {gen_server,handle_msg,6,[{file,\"gen_server.erl\"},{line,686}]},\n         {proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,247}]}]}","reason":"{gen_server,call,\n            [ioq,\n             {request,<0.286.0>,{pread_iolist,4490},other,<0.435.0>,undefined},\n             infinity]}","ref":2383229562}

Against repo:

$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X PUT
{"ok":true}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X POST -d '{"name": "Alice", "number": 42}'
{"ok":true,"id":"d9d564521ee139ec83134600f4000e0f","rev":"1-7b54ca479c9043f8cc4cd83777ce6b75"}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha -X PUT --data @alpha.json
{"ok":true,"id":"_design/alpha","rev":"1-b445a33cf17da10d2f2502f68f58462b"}

$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha/_search/searchindex001 -X POST -d '{"query": "name:Alice*"}'
{"error":"ou_est_clouseau","reason":"Could not connect to the Clouseau Java service at clouseau@127.0.0.1"}

Since HEAD behaves as expected with ou_est_clouseau error, I suspect this is some kind of configuration issue, but I can't figure out values in ini config to reproduce it.

If anyone will manage to induce this locally in dev environment, please leave steps in a comment.

kocolosk commented 3 years ago

Intriguing! I think I figured it out. The difference is that the couchdb:3 container is configured without any node name and is not running in distributed mode. In that mode trying to monitor the remote Clouseau process triggers a badarg error:

# /opt/couchdb/erts-9.3.3.14/bin/erl -boot /opt/couchdb/releases/3.1.1/start_clean
Erlang/OTP 20 [erts-9.3.3.14] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.3.3.14  (abort with ^G)
1> erlang:monitor(process, {main, 'clouseau@127.0.0.1'}).
** exception error: bad argument
     in function  monitor/2
        called as monitor(process,{main,'clouseau@127.0.0.1'})

whereas in dev mode we are running a CouchDB Erlang node and the behavior changes to deliver a 'DOWN` message to the mailbox:

# /opt/couchdb/erts-9.3.3.14/bin/erl -boot /opt/couchdb/releases/3.1.1/start_clean -name test@127.0.0.1
Erlang/OTP 20 [erts-9.3.3.14] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.3.3.14  (abort with ^G)
(test@127.0.0.1)1> erlang:monitor(process, {main, 'clouseau@127.0.0.1'}).
#Ref<0.2127152124.2865758209.248364>
(test@127.0.0.1)2> receive M -> M end.
{'DOWN',#Ref<0.2127152124.2865758209.248364>,process,
        {main,'clouseau@127.0.0.1'},
        noconnection}
(test@127.0.0.1)3> 

Not sure offhand what the right fix is here. Seems like we ought to be able to run gracefully without the Erlang distribution, although it's news to me that the Docker image is configured that way.

wohali commented 3 years ago

From the Docker container README:

CouchDB also uses /opt/couchdb/etc/vm.args to store Erlang runtime-specific changes. Changing these values is less common. If you need to change the epmd port, for instance, you will want to bind mount this file as well. (Note: files cannot be bind-mounted on Windows hosts.)

and

NODENAME will set the name of the CouchDB node inside the container to couchdb@${NODENAME}, in the file /opt/couchdb/etc/vm.args. This is used for clustering purposes and can be ignored for single-node setups.

Try setting NODENAME?

kocolosk commented 3 years ago

Hi @wohali yes, I can confirm that starting the container with a NODENAME specified restores the intended behavior. It just seems to me that we'd want to be better behaved inside CouchDB itself when running in non-distributed mode, but I'm not sure about the best way to effect that change.

rnewson commented 1 year ago

fixed by https://github.com/apache/couchdb/pull/4404