Implement `Send` for `Scanner`

wojciech-graj commented 1 week ago

Adding the Send trait to the Scanner was briefly discussed in #186, but I found the answer unconvincing. The explanation was that a Scanner cannot safely be used by many threads simultaneously, however that's what the Sync trait is responsible for, while the Send trait simply means that the Scanner could be moved between threads.

The main use-case for this is in asynchronous code. It is currently impossible to spawn an async task and create a scanner within it, even though the scanner will only be used by one thread at any given time, because the task can run on any thread. The following code will fail to compile.

tokio::spawn(async {
    let rules: Rules = ...;
    let scanner = Scanner::new(&rules);
});

Below are the roadblocks to implementing Send:

Rc is used instead of its thread-safe counterpart Arc. (easily changeable)
ScanContext::console_log needs to be changed from Option<Box<dyn FnMut(String) + 'r>> to Option<Box<dyn FnMut(String) + Send + 'r>> (easily changeable)
ScanContext::scanned_data and ScanContext::wasm_store are pointers. scanned_data appears to be a pointer as a workaround to the borrow-checker's complaints, and there doesn't appear to be a trivial way to change it to an Option<&[u8]>.

plusvic commented 1 week ago

I think that Send can be implemented for the Scanner type without issues, allowing it's use from async functions. It's safe to pass the ownership of the scanner from one thread to another one. However, it's not safe to use the same scanner from different threads concurrently, therefore Sync can not be implemented. It's not only a matter of changing Rc to Arc and solve the other issues, it's that the scanner has data structure that tracks the progress of the current scan, and therefore two scan operations can't run concurrently.

plusvic commented 1 week ago

I was trying to replicate the issue, but this works for me:

use std::io;

#[tokio::main]
async fn main() -> io::Result<()>  {
    let handle = tokio::spawn(async {
        let rules = yara_x::compile("rule test { condition: true }").unwrap();
        let mut scanner = yara_x::Scanner::new(&rules);
        let scan_results = scanner.scan(b"").unwrap();
        println!("Matching {} rules", scan_results.matching_rules().len())
    });
    handle.await?;
    println!("Finished");
    Ok(())
}

Can you provide some example that fails to compile?

plusvic commented 1 week ago

Answering myself... this reproduces the issue:

use std::io;
use tokio::task::yield_now;

#[tokio::main]
async fn main() -> io::Result<()>  {
    let handle = tokio::spawn(async {
        let rules = yara_x::compile("rule test { condition: true }").unwrap();
        let mut scanner = yara_x::Scanner::new(&rules);
        yield_now().await;
        let scan_results = scanner.scan(b"").unwrap();
        yield_now().await;
        println!("Matching {} rules", scan_results.matching_rules().len())
    });
    handle.await?;
    println!("Finished");
    Ok(())
}

plusvic commented 1 week ago

@wojciech-graj can you try this branch https://github.com/VirusTotal/yara-x/tree/make-scanner-send?

wojciech-graj commented 1 week ago

I took a look at the branch, and I think the approach of manually implementing Send is probably fine iff at least points 1 and 2 are addressed. Rc doesn't guarantee that the value within it is in the same state on different threads, so if the Scanner is used on thread 1, then gets moved to thread 2, any internal state stored in an Rc will be in an undefined state.

plusvic commented 1 week ago

What do you mean with this?

Rc doesn't guarantee that the value within it is in the same state on different threads, so if the Scanner is used on thread 1, then gets moved to thread 2, any internal state stored in an Rc will be in an undefined state.

Can you elaborate more?

wojciech-graj commented 1 week ago

Rc stands for "Reference Counted", so a value that lives somewhere, and will be freed once it ceases to be referenced. Internally an Rc stores a counter, which is incremented whenever a new reference is created, and which is decremented whenever a reference is dropped. This counter is not atomic and is not guaranteed to be the same across all threads (there is no memory barrier), so you could have the following situation

Thread 1: create Rc with counter = 1
Move Scanner to thread 2
Thread 2: increment Rc's counter
Move Scanner to thread 1
Thread 1: decrement Rc's counter from 1 to 0 (Rc's counter is 1 because thread 2's counter hasn't been propagated from cache to main memory, or thread 1 didn't get a cache miss when fetching value of counter)
Thread 1: free value stored in Rc, despite there being an existing reference to it that was created by thread 2

plusvic commented 1 week ago

Now I see what you mean, I didn't take into account cache issues. I think we must find some alternative, like creating some other type that is both Send + Sync by protecting the Scanner with a mutex or something like that.

I don't see the change from Rc to Arc as a good solution because we would be paying the performance penalty of using Arc in a case where multiple threads can't be used anyways. In other words, we would be using thread-safe references in a context where the scanner can't be shared by multiple threads, that's an overkill.

Also, as far as I know the async code will fail to compile only if you have some await after creating the Scanner and before you use the scanning results. For instance, this won't compile:

tokio::spawn(async {
  let rules = yara_x::compile("rule test { condition: true }").unwrap();
  let mut scanner = yara_x::Scanner::new(&rules);
  yield_now().await;
  let scan_results = scanner.scan(b"").unwrap();
  println!("Matching {} rules", scan_results.matching_rules().len())
});

But this other code will compile:

tokio::spawn(async {
  let rules = yara_x::compile("rule test { condition: true }").unwrap();
  {
     let mut scanner = yara_x::Scanner::new(&rules);
     let scan_results = scanner.scan(b"").unwrap();
     println!("Matching {} rules", scan_results.matching_rules().len())
  }
  yield_now().await;
});

So, you can use the scanner inside an async block, as long as the async code doesn't yield from the moment you build the scanner until the moment you process the results. This is not ideal when you want to re-use the scanner for scanning multiple files, but it in certain cases it would work without needing Send at all.

In resume, I think it's a good idea to provide some type that is both Sync and Send that can be easily used in async code, but the solution shouldn't be using Arc everywhere.

plusvic commented 1 week ago

It turns out that my idea of creating some other type that wraps an Scanner inside a Mutex won't work. By using a mutex you can convert Send + !Sync into Send + Sync. But in this case we want to turn !Send + !Sync into Send + Sync. That's not possible without explicitly implementing Send and Sync for the wrapper type.

wojciech-graj commented 1 week ago

I created a branch on which I have implemented changes 1 and 2, you can find it here: https://github.com/wojciech-graj/yara-x/tree/send-scanner

I scanned a few GB of files filled with random data, with the https://github.com/Yara-Rules/rules ruleset, and below are my results

Rc:
cargo r --release scan ../rules/**/*.yar ../sample -r  776.96s user 4.74s system 647% cpu 2:00.75 total
cargo r --release scan ../rules/**/*.yar ../sample -r  768.74s user 5.05s system 629% cpu 2:02.83 total

Arc:
cargo r --release scan ../rules/**/*.yar ../sample -r  763.22s user 4.74s system 642% cpu 1:59.55 total
cargo r --release scan ../rules/**/*.yar ../sample -r  772.34s user 5.15s system 633% cpu 2:02.82 total

There appears to be no performance difference when switching to Arc from Rc. Please feel free to run more benchmarks, but at least based on these numbers, I see no reason to not switch to Arc.

plusvic commented 1 week ago

I agree that the performance penalty will be probably negligible, but I still believe that changing every Rc to Arc inside a type that is intended to be used in single-thread environment is not a good design decision. In one hand I'm tempted to make this change because it facilitates the use of the yara-x crate in async code, but in the other hand I think that the async requirements are forcing me into a change that doesn't feel natural at all, this is more like a workaround than a change that feels logical to me.

I don't have too much experience with async Rust, but it looks like there are solutions for using !Send types in async code:

https://procmarco.com/blog/2021-05-04-a-story-about-async-rust-and-using-send-types/ https://ryhl.io/blog/actors-with-tokio/

I haven't delved into this, and I'm aware that it puts the burden on the users, but for the time being I think that's the right approach.

There's an additional reason for not allowing the scanner to be moved freely from one thread to another. Some yara-x modules use thread local storage, for instance, the magic module maintains a copy of its signature database per-thread, because libmagic is not a thread-safe library. In an environment like tokio, where the runtime can decide to move the scanner from one thread to another freely, this imply that you can have a lot of copies of this database even if you have a single scanner, just because you don't control how the scanner is assigned to a thread So, there are subtle implications that may surprise the users of the crate. This an additional reason for keeping the scanner as !Send, it explicitly tells the crate users that moving the scanner from one thread to another is not a good idea, and that other solutions that involve assigning a scanner to a specific thread are preferable.

wojciech-graj commented 1 week ago

Your last paragraph is probably the best argument for keeping is single-threaded and !Send. I still think that it would be a good idea to eventually make the scanner Send to make it more usable in the async world, but I agree that for now it's probably best to keep it as-is.

VirusTotal / yara-x

Implement `Send` for `Scanner` #228