Closed BartVerc closed 1 year ago
I am experiencing the same issue but unfortunately also not a C Wizzard.
this should be fixed in the latest release, but please confirm when you get a chance before I close this ticket.
apologies for the delay
Hi @darvid , I was just testing this with the latest version available from pypi, which seems to be v0.3.2. So I wanted to let you know v0.3.3 is not published on pypi yet :)
Hi @darvid, thanks for the memory leak fix. It resolves our first case, where we only compile the database in the loop X times.
However, when we are also dumping the database in the loop we see that after some iterations the memory starts growing.
import hyperscan as hs
db = hs.Database(mode=hs.HS_MODE_BLOCK)
for i in range(100000):
db.compile(expressions=[b'test'], ids=[1], flags=[hs.HS_FLAG_ALLOWEMPTY])
b = hs.dumpb(db)
The memory also starts growing if you load bytes into a database in the same variable as the last one.
import hyperscan as hs
db = hs.Database(mode=hs.HS_MODE_BLOCK)
db.compile(expressions=[b'test'], ids=[1], flags=[hs.HS_FLAG_ALLOWEMPTY])
b = hs.dumpb(db)
for i in range(100000):
db = hs.loadb(b)
As we use the same variable over again I expect Python to clean up the memory. I can sent you the plots if you're interested. Thanks again for looking into it!
ahhhh looks like the leak is with scratch not being freed now. here's a quick workaround while I push and release a fix:
import hyperscan as hs
db = hs.Database(mode=hs.HS_MODE_BLOCK)
db.compile(expressions=[b'test'], ids=[1], flags=[hs.HS_FLAG_ALLOWEMPTY])
b = hs.dumpb(db)
for i in range(100000):
del db.scratch # dealloc scratch anytime you decref db
db = hs.loadb(b)
You are a wizard @darvid :smile:
Thank you very much!
@darvid , we missed a segmentation fault with the code you suggested. It was omitted by the memory profiler. Investigating further..
yeah looks like the source of the issue is the create_scratch
conditional scratch creation in loadb
. The actual workaround is calling loadb like this:
hs.loadb(db, create_scratch=False)
I think I'll just remove that parameter, because calling compile
will implicitly create a scratch if it's set to None
anyways.
let's keep this issue open until i push a release
It's leaking less memory with hs.loadb(b, create_scratch=False)
, around half of it, but it's still leaking.
I will keep digging into it.
hey, this should be fixed in the commit above, but i'm running into some annoying CI issues that have been taking a while to fix and prevent releasing to pypi (i'd rather fix the release pipeline than force publish by hand)
until then you can try to build from master, there's also a dockerfile that should be up-to-date and help with compiling hyperscan. hopefully i'll have the build issues fixed asap
This solves the issues, thanks a lot!
also really sorry for the inconvenience if you cloned the repo locally, i did a bunch of history rewriting and commit amending to reduce noise in the commit log while debugging the build process, so you may need to fresh clone next time.
Thanks for the great library. We discovered a memory leak when a Database object is compiled, dumped, and loaded multiple times. This is increasingly worse when the database is growing larger. Even creating a new Database object every time will result in a leak.
Compiling one pattern in the same Database object:
Creating a new Database object every compile, slows the leak down.
But when the Database object is dumped (and loaded as well) it speeds the memory usage up.
I tried to dig into the C code to find the problem, but I do not have enough expertise to solve the issue. Can someone help or point me in the right direction. Thanks!