Closed PettitWesley closed 2 years ago
What I have done is correctly set the partial metadata fields on the docker message structure. In EC2/vanilla Docker, this happens in the docker code. Those flags are expected to set. The docker message structure is then passed to the underlying log driver implementation which decides what to do with it. So for awslogs driver code for example, it does not use the partial fields at all, it ignores them and just sends the actual log line: https://github.com/moby/moby/blob/61404de7df1a6c75c2cbdc2c3ce226c95bebcaad/daemon/logger/awslogs/cloudwatchlogs.go#L601
The fix in this log driver wrapper is to set the partial fields correctly, but its still up to the underlying log driver implementation (which we import directly from the docker code base) to use or not use those fields.
These are my testing materials.
app.py:
import datetime
from collections import OrderedDict
import time
import os
import json
import signal
import random
import sys
def print_log(s):
print(s)
print(s, file = sys.stderr)
MESSAGE_INTERVAL = 1
MESSAGE_COUNT = int(os.environ.get("MESSAGE_COUNT", 10))
PAYLOAD_SIZE = int(os.environ.get("PAYLOAD_SIZE", 2000))
def main():
i = 0
while i < MESSAGE_COUNT:
print_log(str(i) * PAYLOAD_SIZE)
time.sleep(MESSAGE_INTERVAL)
i += 1
if __name__ == '__main__': main()
Dockerfile:
FROM python:3.8-slim-buster
WORKDIR /app
COPY large.py large.py
ENV PYTHONUNBUFFERED=1
CMD [ "python3", "large.py"]
Run command:
sudo ctr run \
--env "PAYLOAD_SIZE=5000000" \
--env "MESSAGE_COUNT=10" \
--runtime io.containerd.runc.v2 \
--log-uri "binary:///home/ec2-user/amazon-ecs-shim-loggers-for-containerd/bin/shim-loggers-for-containerd?--log-driver=fluentd&--container-id=test-partial-logger&--container-name=test-partial-logger&--fluentd-tag=test-partial-logger" \
docker.io/library/test-partial-logger:latest \
test-partial-logger
Fluent Bit Config:
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
[FILTER]
Name multiline
Match *
multiline.key_content log
Mode partial_message
[OUTPUT]
Name stdout
Match *
[OUTPUT]
Name s3
Match *
bucket my-bucket
region us-east-1
total_file_size 1M
upload_timeout 1m
use_put_object On
I tested with logs of size 500 bytes, 50K, and 5 MB.
Is there a github issue that outlines the issues to be resolved as part of a PV 1.5 release?
@rcollette I think we will treat this fix as a bug fix/patch on the 1.4 series.
Hi @PettitWesley, is the fix available for Fargate PV 1.4 now?
Hey @PettitWesley doesn't this issue prevent https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/filter-multiline-partial-message-mode example from actually working properly in fargate? The logs received by the fluent-bit container over the unix socket do not contain the expected partial message fields - so the multiline filter can't work as expected.
@rflukes You are correct. This fix needs to be released in PV 1.4 so that the partial message fields will be sent to Fluent Bit correctly.
@phamlehieu We are working on it actively, it should be launched soon. I can't say anything more exact, sorry.
@PettitWesley Any update?
Supporting Long Log Lines in Fargate PV 1.4
Background
Related Issues
How Container Logs are processed with FireLens
When the fluentd log driver is used (ex: Amazon ECS FireLens), the following is the data-flow:
The key take-away here is that logs do not go straight to Fluent Bit, they pass through the container runtime and log driver first.
Multiline Log Use Case: Long Log Lines
The other major multiline logging use case is long log lines. Monitoring is critical to modern containerized applications, and we have seen many customers that produce very large and verbose log events. Structured logging, where logs are generated by the app with a set schema/format (usually JSON) is also very common. Many customers, will log huge amounts of information in a single JSON log event- 1 MB log events are not unheard of.
These large logs are emitted by the application in a single line of code/single print statement. However, the container runtime must process them, and most container runtimes, including Docker and Containerd, will split log lines when they are greater than 16KB. In the case of log files, each 16KB chunk of a split log is written to a new log line. In the case of the Fluentd Docker Log driver, each 16KB chunk of data is a separate event.
For the Fluentd Docker Log Driver, a key will be set to note that the message is partial. Below are real split docker logs. A large log line was sent to both stdout and stderr, which are separate pipes, so each is split and creates a series of messages. Notice that stderr and stdout have different values for partial_id.
This is because stderr and stdout
The following partial metadata fields should be present in split container logs:
partial_message
: boolean to signal if the message is partial or not.partial_id
: all parts of a split message get the same unique IDpartial_ordinal
: counts up from 1, ordering the parts of the split message.partial_last
: is this the last split message or not.AWS for Fluent Bit issue: https://github.com/aws/aws-for-fluent-bit/issues/25
Problem Statement
In Fargate PV 1.4, which uses this containerd shim logger package, long log messages are currenly split, but the partial metadata keys are not added to the outputted logs.
These partial metadata keys are standard and customers expect them to be added. PV 1.3 correctly adds these partial metadata keys.
Implementation
Currently, the shim logger package does split long log lines and does track whether messages are partial or not: https://github.com/aws/amazon-ecs-shim-loggers-for-containerd/blob/master/logger/common.go#L234
However, the shim logger code does not properly set the partial metadata attributes in the message structure: https://github.com/moby/moby/blob/master/daemon/logger/copier.go#L114
Implementation will set the partial metadata fields and generate partial IDs.
Testing Plan
Unit Test Cases
TODO
Manual Test Cases
Created a test logger container which outputs 10 long log lines of a configurable length, 5 to stdout and 5 to stderr.
The logs can be sent to a local Fluent Bit and outputted to a file after Fluent Bit re-joins the split messages.
Created a validation script to read the outputted file and validate that all log lines were properly re-joined.
TODO: add scripts and results