GoogleCloudPlatform / ops-agent

Apache License 2.0
140 stars 68 forks source link

fluent-bit service fails due to segmentation fault #1266

Open diegoauad opened 1 year ago

diegoauad commented 1 year ago

Describe the bug google-cloud-ops-agent-fluent-bit.service fails to start. A 'segmentation fault' error is shown in journalctl logs.

To Reproduce Steps to reproduce the behavior:

  1. Start a GCE VM with Debian 10 image.
  2. Install MongoDB Community Edition 5.0.11
  3. Install Ops Agent 2.32.0
  4. Apply the example configuration provided in the guide about Operations Suite for MongoDB
  5. In our case, it worked fine for some hours. Then the fluent-bit.service failed and was unable to start again.

Expected behavior All Agent Ops services working correctly, or at least an informative error message about what's misconfigured.

Environment (please complete the following information):

Additional context We are running MongoDB as a standalone replica, with authentication enabled. Monitoring works fine, we are experiencing issues with logging only.

braydonk commented 1 year ago

Hi @diegoauad, thank you for opening an issue. I have been investigating some similar issues other users have reported with MongoDB logging. Particularly we have seen issues with particularly complicated nested logs coming from MongoDB.

Would you please open a support case so that we can assist in greater detail? In the support case, if you could include the output from our diagnostic tool as well as some representative Mongo logs if possible. You can also mention this GitHub Issue in the case.

If you do open a support case, please respond here with the case number so we can investigate.

RafikFarhad commented 2 months ago

I think this is not solved yet. We have almost 200+ VMs with Mongo workload. On some new VMs, we have the latest ops-agent 2.48.0, where we see a similar issue, and the other VMs with the earlier versions do not show this kind of issue.