falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
212 stars 159 forks source link

[BUG] libs do not update `proc.name` for renamed threads #1011

Open therealbobo opened 1 year ago

therealbobo commented 1 year ago

Describe the bug

The Falco libs use the /proc filesystem to retrieve information about processes when they start up. However, if a thread, during its lifecycle, changes its name using a prctl system call, the proc.name value is not updated. This means that Falco may not be able to accurately identify the process and could lead to errors or security issues.

In the following situation Falco doesn't take into account the renaming because it's only aware of the information of the process at its creation.

  Falco starts
        |
        |
        |----malicious process----------------   -\                             
        |            |                            |                             
        |            |                            |                         
        |            |                            |                             
        |            |   thread renaming          |                             
        |            |          |                 |                             
        |            |          |                 |- malicious process lifecycle
        |            |          |                 |                             
        |            |          |                 |                             
        |            |          |                 |                             
        |            |          |                 |                             
        |            |          |                 |                             
        |            v          v                 |                             
        |-------------------------------------   -/                             
        |                                        
        |
        v

On the contrary, in the following situation, given that the renaming has already taken place before Falco has started, Falco will display correctly the proc.name.

          ----malicious process----------------   -\   
                      |                            | 
                      |                            | 
                      |                            | 
                      |   thread renaming          | 
                      |          |                 | 
   Falco starts       |          |                 | 
         |            |          |                 |- malicious process lifecycle
         |            |          |                 | 
         |            |          |                 | 
         |            |          |                 | 
         |            |          |                 | 
         |            |          |                 | 
         |            v          v                 | 
         |-------------------------------------   -/  
         |                                          
         |                                                                         
         |                                          
         v                                          

How to reproduce it

If we consider the first situation shown above:

  1. Compile the following simple c program (e.g.: gcc worker.c -o worker):
    
    #include <stdio.h>
    #include <pthread.h>

void worker(void arg) { if (pthread_equal(pthread_self(), ((pthread_t)arg))) { FILE *fp; char buffer[1024]; int lines = 0; pthread_setname_np(pthread_self(), "reader");

    sleep(5);
    fp = fopen("/etc/passwd", "r");
    if (fp == NULL) {
        perror("Error opening /etc/passwd");
        pthread_exit(NULL);
    }

    while (fgets(buffer, sizeof(buffer), fp) != NULL)
        lines++;

    printf("Thread-0: /etc/passwd contains %d lines\n", lines);

    fclose(fp);
} else {
pthread_setname_np(pthread_self(), "otherThread");
    printf("%ld started\n", pthread_self());
    sleep(10);
    printf("%ld finished\n", pthread_self());
}

pthread_exit(NULL);

}

int main(void) { pthread_t threads[10]; int rc; pthread_setname_np(pthread_self(), "MainThread");

for (int i = 0; i < 10; i++) {
    rc = pthread_create(&threads[i], NULL, worker, (void *)&threads[0]);
    if (rc) {
        fprintf(stderr, "Error creating thread %d\n", i);
        return 1;
    }
}

for (int i = 0; i < 10; I++)
    pthread_join(threads[i], NULL);

return 0;

}

2. Start Falco with the following rule that will match the thread called _reader_:
```yaml
- rule: Read from /etc/passwd
  desc: A process attempted to read from the /etc/passwd file.
  condition: evt.type=read and fd.name="/etc/passwd"
  output: "Process attempted to read from /etc/passwd (user=%user.name command=%proc.cmdline procname=%proc.name)"
  priority: ERROR
  1. Run the worker program compiled in first step
  2. Falco will output "MainThread" as proc.name

If we consider the second situation shown above:

  1. Compile the following simple c program (e.g.: gcc worker.c -o worker):
    
    #include <stdio.h>
    #include <pthread.h>

void worker(void arg) { if (pthread_equal(pthread_self(), ((pthread_t)arg))) { FILE *fp; char buffer[1024]; int lines = 0; pthread_setname_np(pthread_self(), "reader");

    sleep(5);
    fp = fopen("/etc/passwd", "r");
    if (fp == NULL) {
        perror("Error opening /etc/passwd");
        pthread_exit(NULL);
    }

    while (fgets(buffer, sizeof(buffer), fp) != NULL)
        lines++;

    printf("Thread-0: /etc/passwd contains %d lines\n", lines);

    fclose(fp);
} else {
pthread_setname_np(pthread_self(), "otherThread");
    printf("%ld started\n", pthread_self());
    sleep(10);
    printf("%ld finished\n", pthread_self());
}

pthread_exit(NULL);

}

int main(void) { pthread_t threads[10]; int rc; pthread_setname_np(pthread_self(), "MainThread");

for (int i = 0; i < 10; i++) {
    rc = pthread_create(&threads[i], NULL, worker, (void *)&threads[0]);
    if (rc) {
        fprintf(stderr, "Error creating thread %d\n", i);
        return 1;
    }
}

for (int i = 0; i < 10; I++)
    pthread_join(threads[i], NULL);

return 0;

}

2. Run the `worker` program compiled in the previous step
3. Start Falco **after the renaming** with the following rule that will match the thread called _reader_:
```yaml
- rule: Read from /etc/passwd
  desc: A process attempted to read from the /etc/passwd file.
  condition: evt.type=read and fd.name="/etc/passwd"
  output: "Process attempted to read from /etc/passwd (user=%user.name command=%proc.cmdline procname=%proc.name)"
  priority: ERROR
  1. Falco will output "reader" as proc.name

At the end the concept is simple: if Falco starts before the renaming, it will only use the information available at the start of the process; otherwise if Falco starts after the renaming, it will have the complete information and use the effective thread name.

Expected behaviour

Given that proc.name is the name (excluding the path) of the executable generating the event. When a thread changes its name within a process, Falco should update the proc.name value to reflect the new thread name. This should happen regardless of whether Falco was started before or after the thread was renamed. This will ensure that Falco is correctly identifying the process and can take appropriate actions as needed.

Screenshots Here's how Falco behaves if it starts before the matching process (first situation).

$ sudo falco -A
Tue Mar 28 10:30:10 2023: Falco version: 0.34.1 (x86_64)
Tue Mar 28 10:30:10 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Mar 28 10:30:10 2023: Loading rules from file /home/ubuntu/src/read_rule.yaml
Tue Mar 28 10:30:10 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Mar 28 10:30:10 2023: gRPC server threadiness equals to 8
Tue Mar 28 10:30:10 2023: Starting health webserver with threadiness 8, listening on port 8765
Tue Mar 28 10:30:10 2023: Starting gRPC server at unix:///run/falco/falco.sock
Tue Mar 28 10:30:10 2023: Enabled event sources: syscall
Tue Mar 28 10:30:10 2023: Opening capture with Kernel module
10:30:40.852145252: Error Process attempted to read from /etc/passwd (user=ubuntu command=MainThread procname=MainThread)
10:30:40.852148051: Error Process attempted to read from /etc/passwd (user=ubuntu command=MainThread procname=MainThread)
10:30:40.852152596: Error Process attempted to read from /etc/passwd (user=ubuntu command=MainThread procname=MainThread)
10:30:40.852153490: Error Process attempted to read from /etc/passwd (user=ubuntu command=MainThread procname=MainThread)

Here's how Falco behaves if it starts after (second situation) the matching process and before the actual read (so when the process has already changed its name).

$ sudo falco -A
Tue Mar 28 10:34:48 2023: Falco version: 0.34.1 (x86_64)
Tue Mar 28 10:34:48 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Mar 28 10:34:48 2023: Loading rules from file /home/ubuntu/src/read_rule.yaml
Tue Mar 28 10:34:48 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Mar 28 10:34:48 2023: gRPC server threadiness equals to 8
Tue Mar 28 10:34:48 2023: Starting health webserver with threadiness 8, listening on port 8765
Tue Mar 28 10:34:48 2023: Enabled event sources: syscall
Tue Mar 28 10:34:48 2023: Starting gRPC server at unix:///run/falco/falco.sock
Tue Mar 28 10:34:48 2023: Opening capture with Kernel module
10:34:51.958366528: Error Process attempted to read from /etc/passwd (user=ubuntu command=reader procname=reader)
10:34:51.958370300: Error Process attempted to read from /etc/passwd (user=ubuntu command=reader procname=reader)
10:34:51.958373326: Error Process attempted to read from /etc/passwd (user=ubuntu command=reader procname=reader)
10:34:51.958373935: Error Process attempted to read from /etc/passwd (user=ubuntu command=reader procname=reader)

Here's the output of the ps command after the renaming:

$ ps -T -o pid,tid,comm p 2155073
    PID     TID COMMAND
2155073 2155073 MainThread
2155073 2155074 reader
2155073 2155075 otherThread
2155073 2155076 otherThread
2155073 2155077 otherThread
2155073 2155078 otherThread
2155073 2155079 otherThread
2155073 2155080 otherThread
2155073 2155081 otherThread
2155073 2155082 otherThread
2155073 2155083 otherThread

Environment

Additional context

I think that the problem is linked to the fact that a thread is added to the thread table via parse_clone_exit https://github.com/falcosecurity/libs/blob/9096f420ca325e04b464d66b3bfb324c3ef60e6f/userspace/libsinsp/parsers.cpp#L1653 In this way, when a prctl occurs, the thread comm is never updated. A possible solution could be to hook the prctl and update the thread table accordingly.

leogr commented 1 year ago

Given that proc.name is the name (excluding the path) of the executable generating the event. When a thread changes its name within a process, Falco should update the proc.name value to reflect the new thread name. This should happen regardless of whether Falco was started before or after the thread was renamed. This will ensure that Falco is correctly identifying the process and can take appropriate actions as needed.

@therealbobo I'm not convinced about that assumption. I believe the expectation for proc.name is always to report the initial process name (i.e., the executable name without the path)

@falcosecurity/libs-maintainers WDYT? :thinking: cc @loresuso @darryk10

therealbobo commented 1 year ago

In my opinion the problem is that proc.name is associated with /proc/<pid>/comm which, in the process lifecycle, could change. This fact could mislead in the use of proc.name. Maybe this issue can be solved just updating the docs making this crystal clear or adding, for example, proc.startname. 😄

leogr commented 1 year ago

My point is that if it can change during the execution, rules authors can't rely on (otherwise, it can lead to bypasses). I'd instead find a way to stick with the documented behavior and find a way to fix proc. name if possible. But I'd like to have more opinions on that.

Moreover, we also have another field, proc.exe, that can be a custom name (so perhaps it should reflect the last value of comm ?). Not sure about that, just guessing.

loresuso commented 1 year ago

I strongly believe that updating the process name is not what we need here. This is because many rules rely on proc.name, and for this reason, just using prctl could lead to bypass all of them. I am trying to think about better solution!

darryk10 commented 1 year ago

I think the proc.name should remain consistent and shouldn't contains the thread info. It would be helpful having another field containing the thread information without updating the proc.name field which is strongly used in detection and falco rules. This is also causing a lot of noise cause often the field proc.name is used in whitelist.

gnosek commented 1 year ago

If the only way to change the process comm is via prctl, then IMO we should track the changes.

(the "if" comes from the fact that you can easily overwrite argv[0] from userspace, but IIRC it's entirely separate from comm which lives in the kernel so it should be fine).

AIUI, the confusion comes from all the different things that could be considered the name of a process (and I believe we expose at least most of them one way or another):

Since the comments appearing while I'm typing this indicate we don't want proc.name to contain updated comm, how about we introduce proc.comm for this? But in any case, if there isn't a spec for what fields contain what, let's make one and stick to it (it seems important for proc.name in particular since it implies "the" name while there isn't one).

As much as Falco has become the main user of the libs, security isn't the only use case, we can also use the comm (updated by a legit process) to troubleshoot specific processes/threads by name, e.g. in OSS Sysdig

loresuso commented 1 year ago

Possible problem/question:

Now consider this:

wdyt?

therealbobo commented 1 year ago

I think that proc.comm could be a great idea! 👍

leogr commented 1 year ago

This should not be ok if we want an immutable proc.name.

Anyway, I believe the root cause depends on the usage of task->comm, that's generally not trustable since there's at least one case where one can set it from the userspace.

Thus, why not use the executable name instead? :thinking:

gnosek commented 1 year ago

It all boils down to the question: what exactly is proc.name? In the kernel, IMO the obvious answer would be comm, but it looks like we're gravitating towards it being the same as proc.exe.

leogr commented 1 year ago

Since the comments appearing while I'm typing this indicate we don't want proc.name to contain updated comm, how about we introduce proc.comm for this? But in any case, if there isn't a spec for what fields contain what, let's make one and stick to it (it seems important for proc.name in particular since it implies "the" name while there isn't one).

@gnosek

My TL;DR is: I believe rules authors want proc.name to behave like proc.exe (ie. just the executable name, immutable, @darryk10, and @loresuso do you agree?), and I totally agree to introduce proc.comm (that mirrors task->comm) would be nice to reduce confusion among users.

But I believe we have to reach consensus before introducing such changes.

incertum commented 1 year ago

Hey 👋, happy to join the discussion.

proc.name certainly can not and should never be called exe name or executable name (besides it's mostly cropped at 16 characters depending on your kernel settings which can be annoying). Perhaps all this can even be considered a bit of a broader flaw that affects most of the Falco upstream rules 🙃 . Please checkout https://github.com/falcosecurity/libs/pull/938 where we added the ability to traverse exe and exepath parent process lineage, because for instance for java there are many limitations as typically the app has a custom proc.name and therefore you otherwise wouldn't know if a shell was spawned from java. Lastly, we all are painfully aware of issues for symlinked executables ...

Hence we may not need proc.comm and can just keep the contract that proc.name is task->comm and similar to how we update other thread fields during its lifetime this one could be updated as well one day.

Rather we should prioritize getting the true exepath including resolved symlinks (@loresuso) and then introduce proc.exename which simply would be the last portion of the true full path, because proc.exe is also not really this sometimes ... Subsequently upstream rules should be updated, but I understand what a pain this will be, so proc.comm is a valid option, but please be aware that when changing proc.name to not reflect task->comm we are going against common Linux definitions, ps util output etc, so I would not recommend this.

In summary, getting technical clarity and feature completeness would be amazing as it's such an important aspect for threat detection:

leogr commented 1 year ago

but please be aware that when changing proc.name to not reflect task->comm we are going against common Linux definitions, ps util output etc, so I would not recommend this.

@incertum

Good point.

I thought those definitions used command name (which in my mind is task->comm) and not process name (which in my mind is a totally arbitrary definition). Generally speaking, is there any precise definition of that in the Linux doc?

After your comment, I've changed my mind, and I'm not sure anymore that introducing proc.comm is a good idea. At this point, the short-term solution would be

In the long term, we have to fix the symlink problem, but it's another issue.

incertum commented 1 year ago

@leogr fully agree with touching up the docs a bit more and set expectations even better. Re true executable path and symlinks yes we will work on it and hopefully soon have a trustworthy proc.exename ... I'll tag up with @loresuso to make it a reality and @Andreagit97 also already expressed willingness to help.

Lastly, worthwhile mentioning is that proc.name is just more convenient to use in rules since we can do in (list) but for proc.exepath we would need to type out endswith each time. The good news is that we already have work planned for Falco 0.36 to make the UX better, see https://github.com/falcosecurity/falco/issues/2403.

Let's dig through the ps utils source code? I think this could be good ground truth reference for what folks are expecting or used to.

leogr commented 1 year ago

Over the past few weeks, I have talked to everyone about this problem, and I am now totally convinced that the solution is:

  • to stick with proc.name equals to task->comm (and fix cases when it's not correctly updated, like in the original description of this issue)
  • clearly state in the documentation that proc.name is not always the executable name (we have to set the expectation of rules authors correctly, that's the main point here)

So, IMO the next step would be to review the field documentation.

therealbobo commented 1 year ago

If you all agree, I could try to help with te documentation! 😄

leogr commented 1 year ago

If you all agree, I could try to help with te documentation! smile

Yes please :pray:

incertum commented 10 months ago

This is all addressed, not just documentation wise but @therealbobo also added https://github.com/falcosecurity/libs/pull/1015

Andreagit97 commented 8 months ago

Uhm just reading again this issue I've noticed this is not completely fixed, in #1015 we added the kernel code to manage prctl code but we never added the parsing logic in sinsp to update proc.name, and the same for the clone caller parser. We never update the comm of the thread when a new clone happens while indeed the task->comm could be changed between the first clone and the next one... I would reopen the issue here to keep track of this, wdyt?

incertum commented 8 months ago

oh yes we need to fix this, thanks for checking again @Andreagit97!

poiana commented 5 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 5 months ago

/remove-lifecycle stale

incertum commented 4 months ago

@therealbobo and @Andreagit97 just checking in where we are at for this issue? What still needs to be done? Could we add a new summary of open items?

Andreagit97 commented 4 months ago

Uhm this comment https://github.com/falcosecurity/libs/issues/1011#issuecomment-1766089275 should be still valid, AFAIK nothing has been changed :thinking:

poiana commented 1 month ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 1 month ago

/remove-lifecycle stale