apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.17k stars 1.03k forks source link

Couchdb stoped writing to the journald logs #5022

Open tahirshak opened 5 months ago

tahirshak commented 5 months ago

Description

We are using couchdb v 3.3.3 with the log settings as :

  "log": {
    "level": "error",
    "writer": "journald"
  },

We noticed that couchdb stopped writing to the journald logs and we have to restart the couchdb service for it to start writing to the journal. This has happened few times and has limited our ability to troubleshoot or look at the logs.

We are on Ubuntu 22.04.3 LTS and the all the settings for the journald are the default ones:

cat /etc/systemd/journald.conf

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=yes
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
#ReadKMsg=yes
#Audit=no

We also noticed that restarting the systemd-journald service would not help, only restarting the couchdb service will make it start writing to the logs.

Please let me know if we are facing any bug or have to tweak any settings.

Thanks,

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context

nickva commented 5 months ago

Thank you for your report @tahirshak.

Is there any log emitted right before that?

After that happens does couchdb still work? Can you use the Fauxton interface or make API calls to it?

Does its resources usage seem to change during that time (uses more CPU/memory or less?).

tahirshak commented 5 months ago

Hi @nickva , Please see the response to your questions.

Q: Is there any log emitted right before that?

Looking at the logs leading to when couchdb stopped writing to the journal, in just 5 minutes it emitted about 45k logs entries. Most of those messages are about the indexer such as Starting index update for db: shards and Index update finished for db: shards .

Q: After that happens does couchdb still work? Can you use the Fauxton interface or make API calls to it?

Couchdb still works and functioning, we can login to Fauxton and run the curl commands.

Q: Does its resources usage seem to change during that time (uses more CPU/memory or less?).

Yes, the CPU spiked little bit to 40% and the memory spiked as well to 15%.

nickva commented 5 months ago

@tahirshak thank you for responding. That's interesting about the CPU spiking a bit afterwards. I still don't quite have a clue what might be going on.

In production I only have experience with the syslog writer. That one never seems to behave this way. The journald log writer just seems to write to standard_error https://github.com/apache/couchdb/commit/c2ff7da78e3be137f6790b5d64aaf157348b1e4b but I can't think why it would stop. Wonder if a log gets full or can't be rotated then standard_error would lock up?

Would it be possible for you try a syslog writer and have a system rsyslog package installed to see if it would behave the same way?

nickva commented 5 months ago

Perhaps it's similar to https://github.com/coreos/bugs/issues/990, a systemd issue?

If it's triggered by log volume can try lowering the log level to error: https://docs.couchdb.org/en/stable/config/logging.html#log/level

I don't know if you want to dive in and debug journald/systemd but I owuld to bypass systemd altogether and log to a file or use rsyslog and use the syslog backend: https://docs.couchdb.org/en/stable/config/logging.html#log/syslog_host