apache / couchdb-docker

Semi-official Apache CouchDB Docker images
https://github.com/apache/couchdb-docker
Apache License 2.0
263 stars 136 forks source link

Gradual memory consumption increment (in GBs) after running _bulk_docs endpoint with around 2700 docs (~500kb) with _views for per day in three node cluster #252

Closed robanstha closed 9 months ago

robanstha commented 10 months ago

Using couchdb 3.2.2 in Kubernetes pods with three nodes couchdb cluster and posting documents to database in using _bulk_docs endpoint and the memory usage in the docker container spikes up gradually in GBs while total db size is about 10MB. I'm posting ~2700 documents which totals up ~500kb every day twice until the couchdb crashes because of high memory usage.

Also, after the pods are back up, the memory consumption goes down.

OTP 25 is supposed to have memory leak issue but couchdb 3.3.3 is using OTP 24:

> User-Agent: curl/8.4.0
> Accept: */*
> Content-Type: application/json
> Connection: close
> Content-Length: 260959
>
* We are completely uploaded and fine
< HTTP/1.1 201 Created
< Cache-Control: must-revalidate
< Connection: close
< Content-Length: 233056
< Content-Type: application/json
< Date: Wed, 24 Jan 2024 21:09:09 GMT
< Server: CouchDB/3.3.3 (Erlang OTP/24)
< X-Couch-Request-ID: 2b52ff2219
< X-CouchDB-Body-Time: 2

Example of all databases and its size created using curl: image

Gradual Spike in memory consumption after running _bulk_docs of ~500kb (2700 docs) once every day: image

Expected Behavior

Memory should not grow gradually in GBs while all databases size is about 10MB.

Current Behavior

Memory spikes up by about 1GB for 500kb of documents posted via _bulk_docs

Steps to Reproduce (for bugs)

  1. Create a three node CouchDB cluster and deploy in kubernetes.
  2. Create mock_documents.json with ~2700 documents
  3. Create new database using curl.
  4. POST documents using _bulk_docs and PUT design docs to show the list of documents ("function(head, req) { start({ headers: { 'Content-Type': 'application/json' } }); var result = []; while (row = getRow()) { result.push(row.value); } send(JSON.stringify(result)); }").
  5. Repeat steps 2 and 3 multiple times with new database name.
  6. Check memory consumption of the container. Ex:
    curl -X PUT  http://test:test@localhost:5984/testbuild1
    curl -X POST -H "Content-Type: application/json"  http://test:test@localhost:5984/testbuild1/_bulk_docs -d @mock_documents.json

Context

We have a process that creates 2700 docs (~500kb total) with new database every day. The high memory consumption increases every day and the container stops and restarts once it reaches the peak.

Your Environment

Local docker container and kubernetes pods (Reproducible on both 3 node cluster but NOT on single node).

robanstha commented 10 months ago

Apparantly, it seems like the high memory consumption is because of the design document views. It just seem to grow over time. Even though if the database is deleted, the views are probably being cached which is taking up memory.

robanstha commented 9 months ago

Found a workaround for the issue:

  1. Delete the views before deleting the database.
  2. Run database compact to remove the deleted views permanently.
  3. Delete the database.

This has free up all the memory that were consumed by views in the above case: image From docker stats, you can see that the memory usage was reduced by 1.xGB to 1xxMB. If views were not deleted and db wasn't compacted, it would have stuck in 1.xGB even after deleting database.