falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
216 stars 160 forks source link

Infrequent Segfaults for CAPSET_X #817

Closed adduali1310 closed 1 year ago

adduali1310 commented 1 year ago

Describe the bug

Noticed segfaults when running Falco. These segfaults seem to be intermittent and not very frequent. Tried to enable log-level debug to figure out the issue, but that was not very helpful.

Tried to examine the coredump and on doing so, I can see the segfault occurs for the PPME_CAPSET_X event case i.e when trying to parse the event for capset_exit on the following line : https://github.com/falcosecurity/libs/blob/3.0.1+driver/userspace/libsinsp/parsers.cpp#L5697

Further tried to examine the issue via gdb and I can see that the threadinfo in the coredump is a nullptr and the threadinfo reference ( m_tinfo_ref) is also empty. The rest of the evt struct seems to have expected values.

Although the capset syscall is to set capabilities of a thread, my guess is some of the starting events have been missed as a result of which the threadinfo is a nullptr.

I can see similar nullptr checks through out the code file which I think handles this issue for specific cases. Ex - https://github.com/falcosecurity/libs/blob/3.0.1+driver/userspace/libsinsp/parsers.cpp#L2358 https://github.com/falcosecurity/libs/blob/3.0.1+driver/userspace/libsinsp/parsers.cpp#L2827

How to reproduce it

For now seems to be very intermittent which makes me believe that this happens only when some start information is missed.

Edit: Submitted a PR that should help handle the issue. Expected behaviour

No Segfaults should occur.

Environment

jasondellaluce commented 1 year ago

cc @loresuso

loresuso commented 1 year ago

Hi @adduali1310, thank you for noticing this. I agree with you that we are missing a check on evt->m_tinfo being not null before dereferencing it as we also do in other parsers, thanks for your PR! It would be interesting to understand when this actually happens.

Andreagit97 commented 1 year ago

@adduali1310 thank you very much for the investigation and for the fix!

adduali1310 commented 1 year ago

Thanks for the feedback folks.

As to your point @loresuso about when this happens, I do not have any hypothesis about it currently but will try to investigate further to get a better idea. Do you have any thoughts on when this might happen?

Apart from this, wanted to bring up a couple of other points as well.

I was going through the userspace/libsinsp/parsers.cpp file source code and found a couple more issues:

  1. The check for thread information needs to be included in a couple more parsers. It is included in most parsers that utilize the thread information apart from a few. I have those changes ready. I could either add those as separate commits to this PR(https://github.com/falcosecurity/libs/pull/818) or create a different PR if it is better to track that information separately since this specifically deals with the CAPSET_X parser.

  2. Another point is regarding consistency. While going through the source code I noticed the evt->m_tinfo being compared to a nullptr in some places , NULL in some and a logical NOT check in others. While functionally all work, it is fairly inconsistent and difficult to read.

Ex- https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp#L223 (NULL check) https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp#L849 (negation check) https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp#L2430 (nullptr check)

I don't mind submitting a PR for making the file more consistent but wanted to get all of your thoughts on this.

loresuso commented 1 year ago

Hi @adduali1310, I agree with you! Personally, I'd open another PR to address the addition of the check to the other parsers and the consistency problem. I guess I would stick with the use of nullptr since parsers.cpp is indeed a C++ file, but let's wait also for the opinion of the other folks.

Anyway, I believe the segfault occurred because of some dropped events and the reset function that we call at the beginning of the processing of each event couldn't find an entry in the thread table for the event thread id, so the tinfo was kept to null