fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.13k stars 432 forks source link

[osquery 5.13.1] osqueryd crashing on Fedora 40 host with distributed queries #20594

Closed dherder closed 2 months ago

dherder commented 4 months ago

Fleet version: 4.53.1


💥  Actual behavior

When running fleetd on a Fedora 40 host, continued osqueryd crashing is observed.

🧑‍💻  Steps to reproduce

Crash logs: https://drive.google.com/file/d/1oazsLdWMmoMLRMYPbINV409n199Fj_sj/view?usp=drive_link

dherder commented 4 months ago

For whoever looks at this issue, we have repro'd in our cloud eval environment. Please reach out to me for the creds to access.

sharon-fdm commented 4 months ago

Reproduced by @dherder removing the reproduce label.

dherder commented 4 months ago

@sharon-fdm this is a high priority issue for prospect-redwine. Should we add the "P2" label?

lucasmrod commented 4 months ago

Seems related to https://github.com/kolide/launcher/issues/1773

And see public discussion here.

sharon-fdm commented 4 months ago

@dherder, makes sense to me to add P2. We can swap it with other items in the sprint (@noahtalerman maybe swap with #19561). @lukeheath, please approve.

lukeheath commented 4 months ago

@dherder @sharon-fdm Agreed, upgrading to P2.

sharon-fdm commented 4 months ago

Thanks @lukeheath

lucasmrod commented 3 months ago

UPDATE:

How to reproduce

We've reproduced the issue in Fedora 38 and 40 (most likely it's on Fedora 39 too). You can reproduce this by running the following query: select 1 from rpm_packages rp, os_version ov where rp.name = "foo-fedora-playbooks" AND ov.name = "Fedora Linux"; on such systems (it will crash osquery 3 out 5 times or so).

What's happening

I've opened an osquery issue that describes the findings so far: https://github.com/osquery/osquery/issues/8384.

Next steps

Stefano from the osquery team will try to upgrade librpm in osquery from 4.18.0 to 4.18.2. 4.18.2 has the following fix related to the segfault: https://patchwork.yoctoproject.org/project/oe-core/patch/20230703065909.45555-1-anuj.mittal@intel.com. We believe that upgrading may fix the crash.

JoStableford commented 3 months ago

Related to a Slack conversation

JoStableford commented 3 months ago

Related to a Slack conversation

lucasmrod commented 3 months ago

Removing Fleet's milestone as this is an osquery core bug (being fixed in 5.13.1 or 5.14.0, TBD)

lucasmrod commented 3 months ago

I can confirm that the update of librpm from 4.18.0 to 4.18.2 fixed the issue (no more crashes when querying rpm_packages). (Verified by downloading artifacts from today's master builds which contain the bug fix.)

xpkoala commented 3 months ago

Confirmed no crashes on Fedora 38.

fleet-release commented 2 months ago

Fedora host finds calm, Crashing clouds part with each query, Fleet sails smooth, no harm.