Closed jsedoc closed 3 years ago
pinging @Ssloto
You're totally right about this one, @jsedoc . I'm pretty sure that this needs to get axed. I'd be down to do #3 or #1. When would you need this by? I can probably get to it around the weekend.
I did a simple version of #3 that includes a fair amount of major cleanup. There's a lot of stuff that really was... messed up in the version of the Python wrapper that was up before.
I got some random files off OPUS and ran them with the old & new versions w/ Python 3.8. As far as I can tell, new version output is sensible and it doesn't hang indefinitely on vocab ratios.
file sizes: repr = 57,441 avail = 73,505 seed = 79
old: 1m24.733s new: 0m2.120s
that's a hecking speed improvement.
there are probably some other things I can do to de-horrify the wrapper, but I'd prefer just porting the Perl to Python in an efficient manner. Should be do-able.
Let me know if this resolves the issue for you, or if you find any other bugs!
The calculation of the vocab ratios (https://github.com/amittai/cynical/blob/master/python_cynical_wrapper.py#L206) is very slow in the python wrapper script and it does not appear to be necessary.
Potential fixes: