The multiline parser regex that's used in fluentbit works for celery, but when other non-celery pods restart, fluentbit gets confused and treats the individual log entries as giant never-ending log entries. These entries end up causing fluentbit to run out of memory and then dump the log.
A work around has been implemented where only celery is using the multiline parser, but further testing needs to be done to see if the multiline entries for the other pods work as expected. Ideally, the MLP should be global for all pods so we don't have to manage individual applications in fluentbit.
[Bug Severity]
SEV-2 Major
To Reproduce
Steps to reproduce the behavior:
In a scratch account, set all logs to use the celery MLP
Restart non celery pods
New pod logs will not be shipped to cloudwatch
Expected behavior
All pods should be able to use a standardized multiline parser so that we don't have to configure each k8s deployment individually in FluentBit.
Impact
If applicable
Impact on Notify users:
N/A
Impact on Recipients:
N/A
Impact on Notify team:
Current workaround solves immediate issues, there is an issue of maintaining discrete log paths in fluentd.
Acceptance Criteria
[ ] Logs configuration are reworked with one FluentBit INPUT instead of several ones, in order to reduce memory consumption.
[ ] The multiline logs parser are working as expected for each pod types (API, Admin, DD-API, Celery).
Next Steps
Open tickets on StackOverflow and/or FluentBit GitHub Issues forum about our problem with Celery not being parsed properly with the default formatter.
QA
[ ] Make sure that the multiline log entries work for all API, Admin, DD-API, Celery pods.
Describe the bug
The multiline parser regex that's used in fluentbit works for celery, but when other non-celery pods restart, fluentbit gets confused and treats the individual log entries as giant never-ending log entries. These entries end up causing fluentbit to run out of memory and then dump the log.
A work around has been implemented where only celery is using the multiline parser, but further testing needs to be done to see if the multiline entries for the other pods work as expected. Ideally, the MLP should be global for all pods so we don't have to manage individual applications in fluentbit.
[Bug Severity]
SEV-2 Major
To Reproduce
Steps to reproduce the behavior:
Expected behavior
All pods should be able to use a standardized multiline parser so that we don't have to configure each k8s deployment individually in FluentBit.
Impact
If applicable
Impact on Notify users: N/A Impact on Recipients: N/A Impact on Notify team: Current workaround solves immediate issues, there is an issue of maintaining discrete log paths in fluentd.
Acceptance Criteria
Next Steps
QA