VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
7.93k stars 1.42k forks source link

Crash with deeply nested directory hierarchies #2088

Open georgschoelly opened 2 weeks ago

georgschoelly commented 2 weeks ago

Describe the bug Yara ignores deeply nested directories or even crashes due to stack overflow.

To Reproduce

  1. use Ubuntu 22.04.
  2. Download and install Yara 4.5.1: https://yara.readthedocs.io/en/latest/gettingstarted.html
  3. Create a dummy rule:
cat << EOF > matchall.yara
rule MatchAll {
condition:
        true
}
EOF
  1. Set the fd limit to something high. This ensures we have a crash. If it is lower, there is no crash because the fd limit is exhausted. The result then is that the file is not scanned.
ulimit -n 10000
  1. Create a deeply nested file hierarchy b/b/b/b/b/.../file. I chose 600 as the depth because this ensures the path will take more than the 1024 characters defined in MAX_PATH here: https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L42
DEEP_PATH=$(printf 'b/%.0s' {1..600})
mkdir -p $DEEP_PATH; touch $DEEP_PATH/file
  1. Scan this directory:
$ yara -r matchall.yara b
Segmentation fault (core dumped)

Expected behavior

No core dump. Maybe an error indicating that the directory hierarchy is too deep. Something like this:

$ yara -r matchall.yara b
yara -r matchall.yara b/
MatchAll b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/.../file

Please complete the following information:

Additional context

The issue is missing error checking in cli/yara.c:

char full_path[MAX_PATH];
snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);

if dir is already the size of full_path, the new path is the same as the old path. This leads to an infinite recursion which ends either in a SIGSEGV (i.e. stack overflow) or once the file descriptor limit has been reached.

In the case of the file descriptor, this means that any sufficiently deeply nested files are ignored.

How to fix

I see these options:

  1. Dynamically allocate full_path. This would make the stack frame much smaller and a stack overflow less likely. This would not solve the fd-limit problem however.
  2. Simply print an error:
int printed;
printed = snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);
if (printed >= sizeof(full_path)) {
    fprintf(
        stderr,
        "skipping %s because the path is longer than %zu bytes\n",
        full_path,
        sizeof(full_path));
    }
    de = readdir(dp);
    continue;
}
  1. Restructure the code to avoid recursion.
georgschoelly commented 2 weeks ago

After some more research, I think the best solution is quite simple. The underlining issue is that the code uses MAX_PATH which is a number used only by Windows. On Linux systems we would use PATH_MAX instead. This is not the whole story of PATH_MAX, but it's good enough. 1

I therefore propose the following change:

https://github.com/VirusTotal/yara/blob/v4.5.1/cli/yara.c#L676C7-L676C32

- char full_path[MAX_PATH];
+ char full_path[PATH_MAX];

https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L41-L43

// Maximum length of file paths. This is the only limit that doesn't have the
// YR_ prefix. The intention is using the default MAX_PATH if defined.
  #ifndef MAX_PATH
- #define MAX_PATH 1024
+ #define MAX_PATH 4096
  #endif

+ #ifndef PATH_MAX
+ #define PATH_MAX 4096
+ #endif
plusvic commented 2 weeks ago

Please try with the code in this branch: https://github.com/VirusTotal/yara/tree/max_path

Let me know if it works for you.

georgschoelly commented 2 weeks ago

Thanks for your answer. It works better now, I can scan deeper paths. The stack overflow can still be triggered by using even deeper paths:

DEEP_PATH=$(printf 'b/%.0s' {1..2000})
mkdir -p $DEEP_PATH; touch $DEEP_PATH/file

I can work around this by increasing the stack limit:

ulimit -Ss 16384      // default was 8192

I guess the fundamental problem is that recursion is being used for scanning the file system.

plusvic commented 2 weeks ago

Yes, the right solution would be refactoring the directory walking logic so that it is iterative instead of recursive. A mitigation measure would be imposing a limit to the recursion depth.

georgschoelly commented 2 weeks ago

Even with an iterative approach there are problems:

All of this is non-trivial to fix.

However, by dynamically allocating full_path we can get rid of the stack overflow since the stack frame is much smaller. Therefore I would do the following change and then consider the issue solved:

--- yara-max_path/cli/yara.c    2024-06-19 09:22:21.000000000 +0000
+++ yara-max_path2/cli/yara.c   2024-06-19 14:23:42.187843585 +0000
@@ -667,12 +667,14 @@
   {
     struct dirent* de = readdir(dp);

+    char *full_path = calloc(YR_MAX_PATH, sizeof(char));
+    const size_t full_path_size = YR_MAX_PATH * sizeof(char);
+
     while (de && result != ERROR_SCAN_TIMEOUT)
     {
-      char full_path[YR_MAX_PATH];
       struct stat st;

-      snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);
+      snprintf(full_path, full_path_size, "%s/%s", dir, de->d_name);

       int err = lstat(full_path, &st);

@@ -731,6 +733,7 @@
       de = readdir(dp);
     }

+    free(full_path);
     closedir(dp);
   }