cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

FluentBit emitter stops sending to cloudwatch #169

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Describe the bug

When the multiline parser configuration is wrong, the buffer for its chunks overfill the memory and sends a signal for the emitter to stop sending to cloudwatch. As a result we end up missing log data in cloudwatch.

SEV2-Major

To Reproduce

  1. Re-integrate celery multiline parser with all pods
  2. Break themultiline parser
  3. Restart FluentBit daemonset
  4. Restart notify pods
  5. Tweak the FluentBit input definitions to include FluentBit debug logs
  6. FluentBit will stop sending logs to cloudwatch for the affected pods

Expected behavior

Logs should be sent to cloudwatch consistently even if there is a config error the multiline parser. If the memory chunk gets filled, the offending logs should be dropped so that remaining logs can make it through.

Impact

Logs not being sent causes audit and compliance issues as well as hampers troubleshooting.

Impact on Notify team: We can fall out of policy compliance, and will have increased difficulty troubleshooting issues

Additional context

Action item from FluentBit invalid timestamps incident.

Next steps

Reopen the AWS ticket on this issue and provide the configuration along with debug log samples to provide to support.

jimleroyer commented 10 months ago

221 is related

jimleroyer commented 10 months ago

Let's revisit this when we find new issues with the multiline parser configuration.

P0NDER0SA commented 5 months ago

Scope of this one is done. moving to done