falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
232 stars 165 forks source link

[Tracking] Params inconsistencies in our drivers #515

Open Andreagit97 opened 2 years ago

Andreagit97 commented 2 years ago

PLEASE NOTE

This issue is mainly for tracking purposes, some points cannot be addressed until we solve the scap-file compatibility issue -> https://github.com/falcosecurity/libs/pull/1381#issuecomment-1746613905

Generic context

The aim of this issue is to track all the inconsistencies when we send event params from our drivers (modern_bpf, bpf, kernel module) to userspace. Some widespread issues that need a dedicated conversion

Syscall-specific issues

LEGENDA

open_by_handle_at :arrow_right:

dup3 :arrow_right:

open

openat

openat2

eventfd :arrow_left:

eventfd2 :arrow_left:

inotify_init :arrow_left:

signalfd :arrow_left:

signalfd4 :arrow_left:

timerfd_create :arrow_left:

userfault_fd :arrow_right:

ptrace :arrow_right:

mkdirat :arrow_right:

pipe2

renameat2 :arrow_right:

execve :arrow_right:

execveat :arrow_right:

fork :arrow_right:

clone :arrow_right:

clone3 :arrow_right:

vfork :arrow_right:

socket :arrow_left:

connect :arrow_right:

socketpair :arrow_left:

Same issues of socket syscall

socketpair :arrow_right:

accept :arrow_right:

accept4 :arrow_left:

listen :arrow_left:

bpf :arrow_left:

flock :arrow_left:

quotactl :arrow_left:

quotactl :arrow_right:

unshare :arrow_left:

mount :arrow_left:

umount2 :arrow_left:

linkat :arrow_right:

unlinkat :arrow_right:

setns :arrow_left:

setrlimit :arrow_left:

prlimit64 :arrow_left:

sendto :arrow_left:

sendmsg :arrow_left:

ppoll:arrow_left:

ppoll:arrow_left:

ppoll:arrow_left:

recvmmsg:

[NOT ADDRESSABLE] Empty instrumentation

sendmmsg:

[NOT ADDRESSABLE] Empty instrumentation

FedeDP commented 2 years ago

Hi Andrea, like always, good catch! :D

I agree with you; i'd phrase it in a different way actually: "if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)

To retain backward compatibility, i think we could:

This fixes the first 2 points; the others are different cases instead.

Andreagit97 commented 2 years ago

Hi Andrea, like always, good catch! :D

I agree with you; i'd phrase it in a different way actually: "if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)

:exploding_head: :exploding_head:

To retain backward compatibility, i think we could:

* add a new `PT_FD32` and `PT_PID32`

* new drivers will send PT_PID32 and PT_FD32

* we are still compatible with old 64-bit types, while reading from scap files

* of course, bump maj schema version :/

This fixes the first 2 points; the last is a different case instead.

Completely agree with this solution!

incertum commented 2 years ago

Love it, big +1, great catch!

FedeDP commented 2 years ago

First 2 points will be addressed by #526

Andreagit97 commented 2 years ago

I have slightly changed the issue format with Generic event issues and Specific event issues in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues

jasondellaluce commented 2 years ago

Tracking down all of this is of incredible value. Thank you a lot!

hbrueckner commented 2 years ago

I have slightly changed the issue format with Generic event issues and Specific event issues in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues

You are welcome! Many thanks @Andreagit97 for the excellent summary and tracker of all the review follow-ups!

gnosek commented 1 year ago

"if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)

Optimism is awesome but let me cool it down a little bit :P Every event has this header, which is somewhat larger than the zero bytes it would need for that claim to be true ;)

struct ppm_evt_hdr {
#ifdef PPM_ENABLE_SENTINEL
    uint32_t sentinel_begin;
#endif
    uint64_t ts; /* timestamp, in nanoseconds from epoch */
    uint64_t tid; /* the tid of the thread that generated this event */
    uint32_t len; /* the event len, including the header */
    uint16_t type; /* the event type */
    uint32_t nparams; /* the number of parameters of the event */
};

BTW, we could probably easily change nparams to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number.

We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.

Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!

These two changes would cut down 6 bytes from every event, equivalent to one and a half fds with no schema changes, just a major api version bump.

FedeDP commented 1 year ago

You are right, of course! Nonetheless, we are wasting lots of bytes that, all together, would:

I was a bit over-reacting though, agree ahah

BTW, we could probably easily change nparams to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number. We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.

Yep!

Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!

You gave thorough ideas! And data :D

I think @Andreagit97 is working on porting current modern-bpf tests to work with kmod and old bpf; this should help us spot many more inconsistencies!

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 1 year ago

/remove-lifecycle rotten

incertum commented 1 year ago

@oheifetz @therealbobo @FedeDP @Andreagit97 we all touched at least some inconsistencies and we just confirmed that for minor type confusions s32/u32 we will not create new events.

Could we get organized and address all these cases, but assign to folks beforehand so that we do not duplicate work?

Fixing all these would make the project look much better! I am happy to be reviewer, but also happy to help more if we don't find enough volunteers :)

Andreagit97 commented 1 year ago

yeah agree with that we could write below this issue if someone is going to address some points of the list :)

Andreagit97 commented 1 year ago

BTW I would avoid all changes that require a new event pair until we finally resolve the scap-file issue, unless we find something to fix ASAP

Andreagit97 commented 1 year ago

I've added the [NOT ADDRESSABLE] marker to all issues that we cannot address now due to the scap-file management https://github.com/falcosecurity/libs/pull/1381#issuecomment-1746613905

LucaGuerra commented 1 year ago

Thank you Andrea, I think this makes it much clearer :pray:

ecbadeaux commented 11 months ago

@Andreagit97 Please mark dup3 as completed with strikethrough.

ecbadeaux commented 11 months ago

@incertum @Andreagit97 Can you guys also mark setns, flock, and unshare as complete?

poiana commented 8 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 8 months ago

/remove-lifecycle stale

poiana commented 5 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 5 months ago

/remove-lifecycle stale

poiana commented 2 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 2 months ago

/remove-lifecycle stale