Hugal31 / yara-rust

Rust bindings for VirusTotal/Yara
Apache License 2.0
73 stars 29 forks source link

Memory leak #59

Closed ikrivosheev closed 7 months ago

ikrivosheev commented 2 years ago

I tested my application and see a huge growth of memory! Files to reproduce error: 16025.tar.gz.zip (this is simple txt files)

Simple code to reproduce:

use std::os::unix::fs::MetadataExt;
use std::sync::Arc;
use threadpool::ThreadPool;
use yara::Compiler;

const WORKERS: usize = 4;
const RULES: &str = "..."; // path to rules
const FILE: &str = "...";  // path to files to scan

fn main() {
    let pool = ThreadPool::new(WORKERS);
    let compiler = Compiler::new().unwrap();
    let compiler = compiler.add_rules_file(RULES).unwrap();
    let rules = Arc::new(compiler.compile_rules().unwrap());
    loop {
        let mut counter = 0;
        println!("Start iter");

        for file in globwalk::GlobWalkerBuilder::new(FILE, "**")
            .build()
            .unwrap()
        {
            if let Ok(file) = file {
                counter += 1;
                if file.metadata().unwrap().size() == 0 {
                    continue;
                }
                let rules = rules.clone();
                pool.execute(move || {
                    let mut scanner = rules.scanner().unwrap();
                    scanner.set_timeout(60);
                    let _ = scanner.scan_file(file.path());
                });
            }
        };
        pool.join()
        println!("Finish iter, files={}", counter);
        std::thread::sleep(std::time::Duration::from_secs(7));

    }
}

What am I doing wrong?

Hugal31 commented 2 years ago

Hi,

Can you provide a sample of your rules? When I test your code, I get a very slow (but still worrying) increase in memory.

ikrivosheev commented 2 years ago

@Hugal31 , hi. Can you test with the changes: https://github.com/Hugal31/yara-rust/pull/57? I read about mem::transmute and I think this is first problem...

Rules: all.zip. I get rules from: https://github.com/Yara-Rules/rules

Hugal31 commented 2 years ago

Just to be clear, you are running your test with #57

ikrivosheev commented 2 years ago

@Hugal31, did you reproduce the problem?

ikrivosheev commented 2 years ago

@Hugal31, I make test with: https://github.com/Hugal31/yara-rust/pull/57. It does not help...

Hugal31 commented 2 years ago

Sorry, I could not reproduce

Have you double-checked the Yara version you are using? Which flags did you enabled? Note that I had to disable on rule depending on cuckoo.

ikrivosheev commented 2 years ago

My application using: 1) python 3.8 2) rust bindings using pyo3 (using stable ABI3) 3) rust-yara with features: vendored, bundled-4_1_2

And I see memory growth... I try write simple example for reproduce the problem. Valgrind and heapcheck show nothing.

Hugal31 commented 2 years ago

I don't understand what Python and pyo3 has to do here. Your sample code does not contains nor run python, right? And you see memory grow with your sample code?

ikrivosheev commented 2 years ago

Some more results: I run process with strace: strace -k -f -e trace=%memory -o /tmp/log bin

Then pmap -p <pid>:

00007fdda4000000  65324K rw---   [ anon ]
00007fdda7fcb000    212K -----   [ anon ]
00007fddac000000  65324K rw---   [ anon ]
00007fddaffcb000    212K -----   [ anon ]
00007fddb4000000  65324K rw---   [ anon ]
00007fddb7fcb000    212K -----   [ anon ]
00007fddb8000000  65324K rw---   [ anon ]
00007fddbbfcb000    212K -----   [ anon ]
00007fddbc000000  65324K rw---   [ anon ]
00007fddbffcb000    212K -----   [ anon ]
00007fddc0000000  65324K rw---   [ anon ]
00007fddc3fcb000    212K -----   [ anon ]
00007fddc4000000  65032K rw---   [ anon ]
00007fddc7f82000    504K -----   [ anon ]
00007fddcc000000  65324K rw---   [ anon ]
00007fddcffcb000    212K -----   [ anon ]
00007fddd4000000  65028K rw---   [ anon ]
00007fddd7f81000    508K -----   [ anon ]
00007fddd8000000  65024K rw---   [ anon ]
00007fdddbf80000    512K -----   [ anon ]
00007fdddc000000  65020K rw---   [ anon ]
....

Then find some address in strace log and see:

11265 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fddcc000000
 > /lib/x86_64-linux-gnu/libc-2.31.so(mmap64+0x26) [0x11ba46]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x5f7) [0x98527]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x262d) [0x9a55d]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x39e3) [0x9b913]
 > /lib/x86_64-linux-gnu/libc-2.31.so(__libc_malloc+0x1b9) [0x9d419]
 > (yr_notebook_alloc+0x49) [0xae709]
 > (_yr_scan_match_callback+0x2a4) [0xac364]
 > (yr_scan_verify_match+0x368) [0xad008]
 > (_yr_scanner_scan_mem_block.isra.0+0x1e2) [0xa9002]
 > (yr_scanner_scan_mem_blocks+0x3ac) [0xa96dc]
 > (yr_scanner_scan_mem+0x7d) [0xa9add]
 ...
 111265 mprotect(0x7fddcc000000, 716800, PROT_READ|PROT_WRITE) = 0
 > /lib/x86_64-linux-gnu/libc-2.31.so(mprotect+0xb) [0x11bb0b]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x64a) [0x9857a]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x262d) [0x9a55d]
 > /lib/x86_64-linux-gnu/libc-2.31.so(pthread_attr_setschedparam+0x39e3) [0x9b913]
 > /lib/x86_64-linux-gnu/libc-2.31.so(__libc_malloc+0x1b9) [0x9d419]
 > (yr_notebook_alloc+0x49) [0xae709]
 > (_yr_scan_match_callback+0x2a4) [0xac364]
 > (yr_scan_verify_match+0x368) [0xad008]
 > (_yr_scanner_scan_mem_block.isra.0+0x1e2) [0xa9002]
 > (yr_scanner_scan_mem_blocks+0x3ac) [0xa96dc]
 > (yr_scanner_scan_mem+0x7d) [0xa9add]

Why memory is not free... This is very strange

ikrivosheev commented 2 years ago

Other example which is ok. I remove threadpool and work with thread:

use std::os::unix::fs::MetadataExt;
use std::sync::Arc;
use yara::Compiler;
use std::thread;

const RULES: &str = "/home/ikrivosheev/projects/test/src/ms_binary.yar"; // path to rules
const FILE: &str = "/home/ikrivosheev/data/16025/";  // path to files to scan

fn main() {
    let compiler = Compiler::new().unwrap();
    let compiler = compiler.add_rules_file(RULES).unwrap();
    let rules = Arc::new(compiler.compile_rules().unwrap());
    loop {
        let mut counter = 0;
        println!("Start iter");

        for file in globwalk::GlobWalkerBuilder::new(FILE, "**")
            .build()
            .unwrap()
        {
            if let Ok(file) = file {
                counter += 1;
                if file.metadata().unwrap().size() == 0 {
                    continue;
                }
                let rules = rules.clone();
                thread::spawn(move || {
                    let mut scanner = rules.scanner().unwrap();
                    scanner.set_timeout(60);
                    let _ = scanner.scan_file(file.path());
                });
            }
        };
        println!("Finish iter, files={}", counter);
        std::thread::sleep(std::time::Duration::from_secs(100));
    }
}
Hugal31 commented 7 months ago

Is this still happening?

ikrivosheev commented 7 months ago

@Hugal31 I think I can close issue. If something change - I will reopen)