cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Bug - Fix fluentbit multi line parsing globally #159

Closed ben851 closed 10 months ago

ben851 commented 1 year ago

Describe the bug

The multiline parser regex that's used in fluentbit works for celery, but when other non-celery pods restart, fluentbit gets confused and treats the individual log entries as giant never-ending log entries. These entries end up causing fluentbit to run out of memory and then dump the log.

A work around has been implemented where only celery is using the multiline parser, but further testing needs to be done to see if the multiline entries for the other pods work as expected. Ideally, the MLP should be global for all pods so we don't have to manage individual applications in fluentbit.

[Bug Severity]

SEV-2 Major

To Reproduce

Steps to reproduce the behavior:

  1. In a scratch account, set all logs to use the celery MLP
  2. Restart non celery pods
  3. New pod logs will not be shipped to cloudwatch

Expected behavior

All pods should be able to use a standardized multiline parser so that we don't have to configure each k8s deployment individually in FluentBit.

Impact

If applicable

Impact on Notify users: N/A Impact on Recipients: N/A Impact on Notify team: Current workaround solves immediate issues, there is an issue of maintaining discrete log paths in fluentd.

Acceptance Criteria

Next Steps

  1. Open tickets on StackOverflow and/or FluentBit GitHub Issues forum about our problem with Celery not being parsed properly with the default formatter.

QA

ben851 commented 10 months ago

Closing in favour of #221