Closed declerambaul closed 1 year ago
Thank you!
@declerambaul thanks a lof from the ML Team @ Wikimedia, we were investigating a memory leak in our model servers using mwparserfromhell and once we found the culprit it was really nice to see that a fix was already in place and ready to go. It would have taken ages to us to provide a fix like yours!
Thanks also to @earwig for the quick review/release cycle!
Fix for the memory leak that causes an issue on certain wikitext (e.g. lots of nested tags, often vandalism), see https://github.com/earwig/mwparserfromhell/issues/286.
I didn't investigate the algorithm itself, and while the leak was due to a missing dealloc, it still takes a long time to parse these problematic wikitexts. I am attaching some details below, it might be helpful if somebody has a look at how to optimize the search tree algorithm itself.
Created a script that parses a single "bad" wikitext (the one from the linked issue).
The valgrind logs point to the problematic method (
PYTHONMALLOC=malloc valgrind --leak-check=yes --track-origins=yes --log-file=valgrind-log.txt python helling.py
).