Closed spacecowboy closed 6 years ago
It seems the default size of a systemd journal field should be 64K https://github.com/ledbettj/systemd-journal/blob/master/lib/systemd/journal.rb#L175
Seems I need to dig a little deeper ... I am going to try and reproduce ...
@spacecowboy what exactly is the behaviour you are seeing... is it truncation of a particular systemd field at 2K or truncation of the outputted fluentd messages?
so this is the fluentd config we have added (other config files are default)
# All logs from systemd
<source>
@type systemd
path /run/log/journal
pos_file /var/log/gcp-journald-all.pos
read_from_head true
tag systemd
logName systemd
</source>
# Retag worker logs
<match systemd>
@type rewrite_tag_filter
rewriterule1 _SYSTEMD_UNIT neo4j-backup-worker systemd.neo4j-backup-worker
</match>
# Parse JSON logs
<filter systemd.neo4j-backup-worker>
@type parser
format json
key_name MESSAGE
reserve_data false
</filter>
# Attach environment property to everything
<filter **>
@type record_transformer
<record>
neo4jcloud.environment "#{ENV['ENVIRONMENT']}"
neo4jcloud.role "#{ENV['ROLE']}"
neo4jcloud.shard "#{ENV['SHARD']}"
</record>
</filter>
# Rewrite the tag on systemd messages to include the unit name
<filter systemd.**>
@type record_transformer
enable_ruby true
tag ${(record["_SYSTEMD_UNIT"] || "journald").gsub("@","_at_").gsub(".service","")}
logName ${(record["_SYSTEMD_UNIT"] || "journald").gsub("@","_at_").gsub(".service","")}
</filter>
And that is getting processed for each split message. The jsonPayload in StackDriver looks like this:
jsonPayload: {
MESSAGE: "{"asctime": "2018-03-08 11:14:17,281", "name": "root", "pathname": "/usr/local/miniconda/lib/python3.6/site-packages/neo4jcloud_adminapi/shell.py", "lineno": 35, "message": "Doing full backup...\n2018-03-08 11:14:12.459+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels\n2018-03-08 11:14:12.542+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db.labels 8.00 kB\n2018-03-08 11:14:12.542+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db\n2018-03-08 11:14:12.611+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db 8.00 kB\n2018-03-08 11:14:12.611+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index.keys\n2018-03-08 11:14:12.617+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index.keys 15.96 kB\n2018-03-08 11:14:12.617+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index\n2018-03-08 11:14:12.622+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index 8.00 kB\n2018-03-08 11:14:12.622+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.strings\n2018-03-08 11:14:12.627+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.strings 8.00 kB\n2018-03-08 11:14:12.627+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.arrays\n2018-03-08 11:14:12.634+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.arrays 24.00 kB\n2018-03-08 11:14:12.635+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db\n2018-03-08 11:14:12.639+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db 15.94 kB\n2018-03-08 11:14:12.640+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.relationshipstore.db\n2018-03-08 11:14:12.645+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.relationshipstore.db 15.94 kB\n2018-03-08 11:14:12.645+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.relationshiptypestore.db.names\n2018-03-08 11:14:12.653+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.relationshiptypes"
PRIORITY: "6"
SYSLOG_FACILITY: "3"
SYSLOG_IDENTIFIER: "neo4jcloud-async-worker"
_BOOT_ID: "e8a9f63a27b7400fa34905db6cff5910"
_CAP_EFFECTIVE: "0"
_CMDLINE: "python3.6 -u -m neo4jcloud_adminapi.database_worker"
_COMM: "python3.6"
_EXE: "/usr/local/miniconda/bin/python3.6"
_GID: "115"
_HOSTNAME: "ip-172-31-27-171"
_MACHINE_ID: "bb6f625fbb9c4c21b9e4c1d7aeb2d532"
_PID: "2548"
_SYSTEMD_CGROUP: "/system.slice/neo4j-backup-worker.service"
_SYSTEMD_INVOCATION_ID: "2d5fc99a86c0474d83727f650256cb5e"
_SYSTEMD_SLICE: "system.slice"
_SYSTEMD_UNIT: "neo4j-backup-worker.service"
_TRANSPORT: "stdout"
_UID: "111"
logName: "neo4j-backup-worker"
neo4jcloud.environment: "tc-xaezei0a"
neo4jcloud.role: "backup_worker"
neo4jcloud.shard: "0"
}
jsonPayload: {
MESSAGE: "tore.db.names 15.96 kB\n2018-03-08 11:14:12.653+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.relationshiptypestore.db\n2018-03-08 11:14:12.658+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.relationshiptypestore.db 8.00 kB\n2018-03-08 11:14:12.659+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.labeltokenstore.db.names\n2018-03-08 11:14:12.664+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labeltokenstore.db.names 15.96 kB\n2018-03-08 11:14:12.665+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.labeltokenstore.db\n2018-03-08 11:14:12.669+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labeltokenstore.db 8.00 kB\n2018-03-08 11:14:12.672+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.schemastore.db\n2018-03-08 11:14:12.679+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.schemastore.db 8.00 kB\n2018-03-08 11:14:12.679+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.relationshipgroupstore.db\n2018-03-08 11:14:12.683+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.relationshipgroupstore.db 15.97 kB\n2018-03-08 11:14:12.683+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.counts.db.a\n2018-03-08 11:14:12.685+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.counts.db.a 8.00 kB\n2018-03-08 11:14:12.685+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.labelscanstore.db\n2018-03-08 11:14:12.692+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labelscanstore.db 48.00 kB\n2018-03-08 11:14:12.692+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore\n2018-03-08 11:14:12.697+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 23.99 kB\n2018-03-08 11:14:12.697+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 17 files\n2018-03-08 11:14:12.780+0000 INFO [o.n.b.BackupService] Start recovering store\n2018-03-08 11:14:16.121+0000 INFO [o.n.b.BackupService] Finish recovering store\nDoing consistency check...\n2018-03-08 11:14:16.400+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/d5f0c631.2018-03-08T11"
PRIORITY: "6"
SYSLOG_FACILITY: "3"
SYSLOG_IDENTIFIER: "neo4jcloud-async-worker"
_BOOT_ID: "e8a9f63a27b7400fa34905db6cff5910"
_CAP_EFFECTIVE: "0"
_CMDLINE: "python3.6 -u -m neo4jcloud_adminapi.database_worker"
_COMM: "python3.6"
_EXE: "/usr/local/miniconda/bin/python3.6"
_GID: "115"
_HOSTNAME: "ip-172-31-27-171"
_MACHINE_ID: "bb6f625fbb9c4c21b9e4c1d7aeb2d532"
_PID: "2548"
_SYSTEMD_CGROUP: "/system.slice/neo4j-backup-worker.service"
_SYSTEMD_INVOCATION_ID: "2d5fc99a86c0474d83727f650256cb5e"
_SYSTEMD_SLICE: "system.slice"
_SYSTEMD_UNIT: "neo4j-backup-worker.service"
_TRANSPORT: "stdout"
_UID: "111"
logName: "neo4j-backup-worker"
neo4jcloud.environment: "tc-xaezei0a"
neo4jcloud.role: "backup_worker"
neo4jcloud.shard: "0"
}
jsonPayload: {
MESSAGE: ":14:08.959918+00:00\n2018-03-08 11:14:16.400+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]\n2018-03-08 11:14:16.449+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...\n.................... 10%\n.................... 20%\n.................... 30%\n.................... 40%\n.................... 50%\n.................... 60%\n.................... 70%\n.................... 80%\n.................... 90%\n...................Checking node and relationship counts\n.................... 10%\n.................... 20%\n.................... 30%\n.................... 40%\n.................... 50%\n.................... 60%\n.................... 70%\n.................... 80%\n.................... 90%\n.................... 100%\nBackup complete.\n", "exc_info": null, "severity": "INFO", "message_id": "47686498436282", "check_consistency": "True", "latency": "0.991261", "backup_id": "2018-03-08T11:14:08.959918+00:00", "backup_bucket": "backup.tc-xaezei0a.neo4j.io", "action": "backup", "dbid": "d5f0c631", "timestamp": "2018-03-08T11:14:09.155303Z"}"
PRIORITY: "6"
SYSLOG_FACILITY: "3"
SYSLOG_IDENTIFIER: "neo4jcloud-async-worker"
_BOOT_ID: "e8a9f63a27b7400fa34905db6cff5910"
_CAP_EFFECTIVE: "0"
_CMDLINE: "python3.6 -u -m neo4jcloud_adminapi.database_worker"
_COMM: "python3.6"
_EXE: "/usr/local/miniconda/bin/python3.6"
_GID: "115"
_HOSTNAME: "ip-172-31-27-171"
_MACHINE_ID: "bb6f625fbb9c4c21b9e4c1d7aeb2d532"
_PID: "2548"
_SYSTEMD_CGROUP: "/system.slice/neo4j-backup-worker.service"
_SYSTEMD_INVOCATION_ID: "2d5fc99a86c0474d83727f650256cb5e"
_SYSTEMD_SLICE: "system.slice"
_SYSTEMD_UNIT: "neo4j-backup-worker.service"
_TRANSPORT: "stdout"
_UID: "111"
logName: "neo4j-backup-worker"
neo4jcloud.environment: "tc-xaezei0a"
neo4jcloud.role: "backup_worker"
neo4jcloud.shard: "0"
}
Note that the MESSAGE
attribute of each message (except maybe the last) is exactly 2048 bytes long (I copied it to a file and ran wc -m
on it)
The output in this case is coming from a single logging.info
call in Python (where we have a JSONFormater attached to the logging framework) and you can verify that it really is coming from a single logging call in python because the messages all make up a single JSON object.
The python process is running inside a systemd service.
Thanks for looking into it @errm . Let me know if I can provide additional details
Hi @spacecowboy this seems quite strange since I don't think the truncation is happening in fluentd, but if you have the default configuration systemd shouldn't start truncating the message until it reaches 64K....
Do you see the same truncation looking at the same entries with another tool e.g. journalctl
?
Hmm. I re-ran it to reproduce, and strangely enough I DO see the chunking in the systemd journal for the service. So this is actually not a problem with this plugin.
Thanks for helping out though @errm , sorry for wasting your time.
No problem... thanks for taking the time to report this anyway!
Our system is printing JSON-formatted logs and in some cases the message inside of the JSON object can be relatively long. In that case, fluentd-systemd appears to be splitting the long entry into 2048 byte pieces.
The same thing was the case for syslog but they added an option it: https://docs.fluentd.org/v1.0/articles/in_syslog#message_length_limit