When I try to call the match method on large data, an OverflowException occurs. Here is a simple example :
import yara
# the capture size is 2.3G
with open('../captures/bg1.pcap', 'rb') as f:
data = f.read()
rule = yara.compile(source='rule foo: bar {strings: $a = "lmn" condition: $a}')
matchs = rule.match(data=data)
Here is the result :
Traceback (most recent call last):
File "test.py", line 10, in <module>
matchs = rule.match(data=data)
OverflowError: size does not fit in an int
After reading the file yara-python.c, I saw that the problem comes from the call to the function PyArg_ParseTupleAndKeywords (line 1372) and more specifically from the format chunk s#. Indeed, as specified in the Python documentation (https://docs.python.org/3/c-api/arg.html) :
For all # variants of formats (s#, y#, etc.), the type of the length argument (int or Py_ssize_t) is controlled by defining the macro PY_SSIZE_T_CLEAN before including Python.h. If the macro was defined, length is a Py_ssize_t rather than an int. This behavior will change in a future Python version to only support Py_ssize_t and drop int support. It is best to always define PY_SSIZE_T_CLEAN.
In order to fix this problem, I'll propose a simple pull request (just define the macro).
When I try to call the
match
method on large data, anOverflowException
occurs. Here is a simple example :Here is the result :
After reading the file
yara-python.c
, I saw that the problem comes from the call to the functionPyArg_ParseTupleAndKeywords
(line 1372) and more specifically from the format chunks#
. Indeed, as specified in the Python documentation (https://docs.python.org/3/c-api/arg.html) :In order to fix this problem, I'll propose a simple pull request (just define the macro).