Closed Paikan closed 4 years ago
Hi, Honestly, I could not figure out what was going on in the underlying regex library. Implemented a quick fix, hoping it won't break anything else. (I think it would be better to rewrite that part of code and use some other method, but don't have enough free time to do it right now.)
A new version with the mentioned fix s now available on PYPI. Thanks for the bug report. Let me know if you find any other ones.
Thanks a lot for your reactivity and hard work.
I can confirm that the problem is fixed with the new release and hope it won't break anything else.
I am about to test this version with full wikipedia dumps for english and french. I will let you know if anything goes wrong.
Honestly, I could not figure out what was going on in the underlying regex library. Implemented a quick fix, hoping it won't break anything else. (I think it would be better to rewrite that part of code and use some other method, but don't have enough free time to do it right now.)
I'm no linguistic expert but doesn't regex stop working somewhere when parsing any markup language? Don't you need to use a CFG?
Honestly, I could not figure out what was going on in the underlying regex library. Implemented a quick fix, hoping it won't break anything else. (I think it would be better to rewrite that part of code and use some other method, but don't have enough free time to do it right now.)
I'm no linguistic expert but doesn't regex stop working somewhere when parsing any markup language? Don't you need to use a CFG?
@5j9 A quick research shows that WIkitext is neither. The Wikitext spec says it belongs to CSG. I'd say (although don't bank on my opinions), that using regex has its share of troubles and it is impossible to fully describe the language with it.
Hi @5j9 and first thanks you for this great project.
I have an issue with tables parsing that hangs forever. It is easy to reproduce the issue with the following gist: https://gist.github.com/Paikan/5b8e3e7939553d7848f9c6dc34daebb2
Thanks