VirusTotal / yara-python

The Python interface for YARA
http://virustotal.github.io/yara/
Apache License 2.0
659 stars 179 forks source link

OverflowError when trying to call match on big data #80

Closed BartholomewPanda closed 6 years ago

BartholomewPanda commented 6 years ago

When I try to call the match method on large data, an OverflowException occurs. Here is a simple example :

import yara

# the capture size is 2.3G
with open('../captures/bg1.pcap', 'rb') as f:
    data = f.read()

rule = yara.compile(source='rule foo: bar {strings: $a = "lmn" condition: $a}')
matchs = rule.match(data=data)

Here is the result :

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    matchs = rule.match(data=data)
OverflowError: size does not fit in an int

After reading the file yara-python.c, I saw that the problem comes from the call to the function PyArg_ParseTupleAndKeywords (line 1372) and more specifically from the format chunk s#. Indeed, as specified in the Python documentation (https://docs.python.org/3/c-api/arg.html) :

For all # variants of formats (s#, y#, etc.), the type of the length argument (int or Py_ssize_t) is controlled by defining the macro PY_SSIZE_T_CLEAN before including Python.h. If the macro was defined, length is a Py_ssize_t rather than an int. This behavior will change in a future Python version to only support Py_ssize_t and drop int support. It is best to always define PY_SSIZE_T_CLEAN.

In order to fix this problem, I'll propose a simple pull request (just define the macro).