Hello,
First of all thank you for making this library, it has been very useful!
I seem to have found a segmentation fault bug. The database compile function breaks when the number of patterns provided reaches somewhere between 522000 and 523000 (on my machine at least). This number is consistent and always the same, and does not seem to vary with respect to the complexity of the patterns. Note that this is not an issue when using the C implementation directly, nor is it a RAM / out of memory related problem.
Here is a minimal reproducible example:
import hyperscan
import numpy as np
db = hyperscan.Database()
n = 521000
# generate `n` patterns, each of 4 bytes written in hex format (e.g. \\x23\\xff\\x3d\\xab) and encodes
# in bytes using utf-8 which is identical to ascii given the each character's order is lower than 128
expressions = ["\\x{:02x}\\x{:02x}\\x{:02x}\\x{:02x}".format(*list(np.random.randint(0,256,4,np.uint8))).encode("utf-8") for i in range(n)]
db.compile(expressions=expressions)
I've looked at the source code and gave it a go, however I'm extremely unfamiliar with the C-python connection API, and did not get very far.
Hello, First of all thank you for making this library, it has been very useful!
I seem to have found a segmentation fault bug. The database compile function breaks when the number of patterns provided reaches somewhere between 522000 and 523000 (on my machine at least). This number is consistent and always the same, and does not seem to vary with respect to the complexity of the patterns. Note that this is not an issue when using the C implementation directly, nor is it a RAM / out of memory related problem.
Here is a minimal reproducible example:
I've looked at the source code and gave it a go, however I'm extremely unfamiliar with the C-python connection API, and did not get very far.