VirusTotal / yara-x

A rewrite of YARA in Rust.
https://virustotal.github.io/yara-x/
BSD 3-Clause "New" or "Revised" License
621 stars 49 forks source link

Using yara-x in a Rust library and handling lifetime specifiers of Scanner #184

Open xrl1 opened 2 weeks ago

xrl1 commented 2 weeks ago

Hello, Related to #139 , but maybe not the same use case: I'm trying to create a library that uses the yara-x crate. The library should initialize the rules internally, and create a struct that holds an instance of the yara-x Scanner.

Reducted code in scanner.rs:

use yara_x::{Rules, Scanner as YaraScanner};
use anyhow::Result;

struct MyScanner<'a> {
    scanner: YaraScanner<'a>,
}

impl<'a> MyScanner<'a> {
    pub fn new(rules: &'a Rules) -> Self {
        let scanner = YaraScanner::new(&rules);
        MyScanner { scanner }
    }

    pub fn scan(data: String) -> Result<String> {
        // Some implementation
    }
}}

Reducted code of lib.rs:

pub struct MyLib<'a> {
    scanner: MyScanner<'a>,
}

impl<'a> MyLib<'a> {
    pub fn new() -> Result<Self> {
        let rules: Rules = load_rules()?;
        let scanner = MyScanner::new(&rules);
        Ok(MyLib { scanner })
    }

    pub fn scan(&self, data: String) -> Result<String> {
        self.scanner.scan(data)
    }
}

load_rules is compiling and loading the rules from a resource file.

I tried countless variations of this code, but I always reach the obstacle of the lifetime specifier on Scanner and get an error of "rules does not live long enough".

I couldn't find a way to wrap YaraScanner in an object that outlives it and holds a Rules object safely.

I cannot create the rules in main.rs because I intend to export this as a library, and I don't want the user to load the Yara rules herself.

The only solution Claude Sonnet and I found where to Box::leak this memory or statically load it, so it will live until the program exits. I want to avoid it to support in the future getting string rules as arguments to MyLib::new, so I'm confined to the lifetime of a MyLib instance.

Please let me know how you think it can be solved, because currently, I think only changing Scanner to take ownership of the rules can solve this.

plusvic commented 2 weeks ago

I believe you can achieve what you want with a bit of unsafe code:

/// Wraps a yara_x::Rules, but preventing it from moving around in memory.
struct PinnedRules{
    rules: yara_x::Rules,
    _pin: PhantomPinned,
}

struct MyScanner<'a> {
    scanner: yara_x::Scanner<'a>,
    // This allows MyScanner to own the yara_x::Rules and pass a reference to the
    // scanner. The use of `Pin` guarantees that the rules won't be moved.
    _rules: Pin<Box<PinnedRules>>,
}

impl<'a> MyScanner<'a> {
    pub fn new(rules: yara_x::Rules) -> Self {
        let pinned_rules = Box::pin(PinnedRules{rules, _pin: PhantomPinned});
        let rules_ptr = std::ptr::from_ref(&pinned_rules.rules);
        let rules_ref = unsafe { rules_ptr.as_ref().unwrap() };
        let scanner = yara_x::Scanner::new(rules_ref);

        Self { scanner, _rules: pinned_rules }
    }

    pub fn scan(&mut self, data: String) -> Result<String> {
        todo!()
    }
}

I haven't tested it thoroughly, so it may contain bugs.

xrl1 commented 2 weeks ago

Thank you, I tested this change in my code, all the tests passed, and nothing panics.

Even though it works, I think this solution is suboptimal - I need to test it more thoroughly, and I'll deep-dive into std::pin docs to make sure this unsafe code won't crash in the future, won't memory-leak, and there isn't any race in the destructor of MyScanner that may cause invalid memory access.

May I still suggest handling this issue in the yara-x library sometime in the future - to avoid forcing the library user to write unsafe code, or to introduce advanced Rust concepts.

qjerome commented 2 weeks ago

@xrl1 then only thing you have to do is that you need to put Rules within your scanner struct so that the Rust compiler knows its lifetime doesn't expire before the struct is dropped. It means your MyScanner needs to own your Rules.

pub struct MyScanner<'s> {
    rules: yara_x::Rules,
    scanner: Option<yara_x::Scanner<'s>>,
}
plusvic commented 2 weeks ago

@qjerome that doesn't work because yara_x::Scanner needs a reference to the rules in MyScanner.rules, you can create a scanner that receives that reference, but you can't move it into MyScanner.scanner.

qjerome commented 2 weeks ago

My bad, I thought it would work ! That's what you get when you write code without testing it ...

qjerome commented 2 weeks ago

Let me add a non null contribution this time:

impl<'s> Deref for MyScanner<'s> {
    type Target = yara_x::Scanner<'s>;
    fn deref(&self) -> &Self::Target {
        &self.scanner
    }
}

impl<'s> DerefMut for MyScanner<'s> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.scanner
    }
}

Should allow you to use your MyScanner as a yara_x::Scanner