coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

System Journal Bad Message #2587

Open mariusgrigoriu opened 5 years ago

mariusgrigoriu commented 5 years ago

Issue Report

Bug

Container Linux Version

2079.4.0

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2079.4.0
VERSION_ID=2079.4.0
BUILD_ID=2019-05-15-0808
PRETTY_NAME="Container Linux by CoreOS 2079.4.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

AWS m4.10xlarge

Expected Behavior

journalctl --verify completes successfully

Actual Behavior

Seeing intermittent Bad message errors when running journalctl --verify. I believe this problem is causing issues in other applications (fluentd in this case).

core@ip-172-16-206-45 ~ $ journalctl --verify
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c296ff-00058a33ef1cd0e2.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2f854-00058a3410aec671.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c26684-00058a33de8b24ec.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c38cd7-00058a34427369b3.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c32a25-00058a34210cbeb6.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c23643-00058a33cdfaca1a.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2c71f-00058a340034861c.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c35ad9-00058a343229e029.journal
core@ip-172-16-206-45 ~ $ journalctl --verify
0b6c88: Data object references invalid entry at 49b408
File corruption detected at /var/log/journal/297a5731602942e08575ed349460392a/system.journal:49b240 (of 8388608 bytes, 57%).
FAIL: /var/log/journal/297a5731602942e08575ed349460392a/system.journal (Bad message)
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c296ff-00058a33ef1cd0e2.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2f854-00058a3410aec671.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c26684-00058a33de8b24ec.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c38cd7-00058a34427369b3.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c32a25-00058a34210cbeb6.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c23643-00058a33cdfaca1a.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2c71f-00058a340034861c.journal
^C██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  76%
core@ip-172-16-206-45 ~ $ journalctl --verify
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c296ff-00058a33ef1cd0e2.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2f854-00058a3410aec671.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c26684-00058a33de8b24ec.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c38cd7-00058a34427369b3.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c32a25-00058a34210cbeb6.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c23643-00058a33cdfaca1a.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c2c71f-00058a340034861c.journal
PASS: /var/log/journal/297a5731602942e08575ed349460392a/system@7f34b4890bb843599ec5cb99816ffa7b-0000000000c35ad9-00058a343229e029.journal

Reproduction Steps

Unknown. It appears to be random whether journalctl --verify completes successfully.

Other Information

ajeddeloh commented 5 years ago

Judging by the number of journals I'm guessing that machine has been running a while. Can you reproduce on a fresh machine?

mariusgrigoriu commented 5 years ago

While I pasted errors from our production systems that have been up a while, I did also see this on fresh machines too.