VirusTotal / yara-python

The Python interface for YARA
http://virustotal.github.io/yara/
Apache License 2.0
646 stars 179 forks source link

Release GIL while compiling Yara rules #183

Closed olii closed 3 years ago

olii commented 3 years ago

This PR adds code that releases GIL while compiling Yara rules. We noticed that when we compile rules in parallel there is no speedup in comparison to sequential compilation.

The bottleneck is that GIL is not released in a function that is CPU intensive. I added the code that releases and reacquires GIL. This eliminates the bottleneck in the multithreaded environment.

The second commit fixes the GIL locking in the callback. The callback is called in case the Yara rule is invalid.

Here is the code I used for benchmarks.

"""
    Yara threaded compilation.
"""

import concurrent.futures
import time
import yara

counter = 0

def any_yara_rule():
    global counter

    counter += 1

    return f"""
        rule any_rule_{counter}
        {{
            strings:
                $text_string = "some string {counter}"
            condition:
                $text_string
        }}   
    """

def generate_ruleset(n=10_000):
    rules = []
    for i in range(n):
        rules.append(any_yara_rule())
    return '\n'.join(rules)

try:
    # Causes segfault if callback is called without GIL.
    yara.compile(source='{}', error_on_warning=True)
except yara.SyntaxError:
    pass

ruleset_as_text = generate_ruleset()
N_RULESETS = 100

# Sequential
s = time.time()
for i in range(N_RULESETS):
    res = yara.compile(source=ruleset_as_text, error_on_warning=True)

seq = time.time() - s
print('Sequential', seq)

# Parallel
s = time.time()
max_workers = 4
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = []
    for i in range(N_RULESETS):
        fut = executor.submit(yara.compile, source=ruleset_as_text)
        futures.append(fut)

    for fut in futures:
        fut.result()

par = time.time() - s
print('Parallel', par)
print(f'Speedup with {max_workers} workers', seq / par)
google-cla[bot] commented 3 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

plusvic commented 3 years ago

Great contribution, thanks!

olii commented 3 years ago

@googlebot I signed it!