Open georgschoelly opened 2 weeks ago
After some more research, I think the best solution is quite simple. The underlining issue is that the code uses MAX_PATH
which is a number used only by Windows. On Linux systems we would use PATH_MAX
instead. This is not the whole story of PATH_MAX
, but it's good enough. 1
I therefore propose the following change:
https://github.com/VirusTotal/yara/blob/v4.5.1/cli/yara.c#L676C7-L676C32
- char full_path[MAX_PATH];
+ char full_path[PATH_MAX];
https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L41-L43
// Maximum length of file paths. This is the only limit that doesn't have the
// YR_ prefix. The intention is using the default MAX_PATH if defined.
#ifndef MAX_PATH
- #define MAX_PATH 1024
+ #define MAX_PATH 4096
#endif
+ #ifndef PATH_MAX
+ #define PATH_MAX 4096
+ #endif
Please try with the code in this branch: https://github.com/VirusTotal/yara/tree/max_path
Let me know if it works for you.
Thanks for your answer. It works better now, I can scan deeper paths. The stack overflow can still be triggered by using even deeper paths:
DEEP_PATH=$(printf 'b/%.0s' {1..2000})
mkdir -p $DEEP_PATH; touch $DEEP_PATH/file
I can work around this by increasing the stack limit:
ulimit -Ss 16384 // default was 8192
I guess the fundamental problem is that recursion is being used for scanning the file system.
Yes, the right solution would be refactoring the directory walking logic so that it is iterative instead of recursive. A mitigation measure would be imposing a limit to the recursion depth.
Even with an iterative approach there are problems:
All of this is non-trivial to fix.
However, by dynamically allocating full_path
we can get rid of the stack overflow since the stack frame is much smaller. Therefore I would do the following change and then consider the issue solved:
--- yara-max_path/cli/yara.c 2024-06-19 09:22:21.000000000 +0000
+++ yara-max_path2/cli/yara.c 2024-06-19 14:23:42.187843585 +0000
@@ -667,12 +667,14 @@
{
struct dirent* de = readdir(dp);
+ char *full_path = calloc(YR_MAX_PATH, sizeof(char));
+ const size_t full_path_size = YR_MAX_PATH * sizeof(char);
+
while (de && result != ERROR_SCAN_TIMEOUT)
{
- char full_path[YR_MAX_PATH];
struct stat st;
- snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);
+ snprintf(full_path, full_path_size, "%s/%s", dir, de->d_name);
int err = lstat(full_path, &st);
@@ -731,6 +733,7 @@
de = readdir(dp);
}
+ free(full_path);
closedir(dp);
}
Describe the bug Yara ignores deeply nested directories or even crashes due to stack overflow.
To Reproduce
b/b/b/b/b/.../file
. I chose 600 as the depth because this ensures the path will take more than the 1024 characters defined inMAX_PATH
here: https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L42Expected behavior
No core dump. Maybe an error indicating that the directory hierarchy is too deep. Something like this:
Please complete the following information:
Additional context
The issue is missing error checking in cli/yara.c:
if
dir
is already the size offull_path
, the new path is the same as the old path. This leads to an infinite recursion which ends either in a SIGSEGV (i.e. stack overflow) or once the file descriptor limit has been reached.In the case of the file descriptor, this means that any sufficiently deeply nested files are ignored.
How to fix
I see these options:
full_path
. This would make the stack frame much smaller and a stack overflow less likely. This would not solve the fd-limit problem however.