bminixhofer / nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Apache License 2.0
604 stars 39 forks source link

Support for older glibc #64

Closed dvwright closed 3 years ago

dvwright commented 3 years ago

Hi, first off thank you for this library, it's the only non-java languagetool alternative I've found.

Unfortunately, I am receiving an error when trying to use it,ImportError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found (required by python/lib/python3.8/site-packages/nlprule.cpython-38-x86_64-linux-gnu.so)

I'm on a hosting environment where I don't have access to upgrade system libraries so i can't just upgrade glibc. The current version is glibc 2.19.

Is glibc 2.27 a hard requirement or is there a way to specify an older version of glibc?

I have a feeling this is Rust specific issue but I am new to Rust and not familiar with it's environment.

Thanks

bminixhofer commented 3 years ago

Hi, thanks.

Everything from glibc 2.11 can work: https://github.com/PyO3/maturin#manylinux-and-auditwheel. It seems I'll have to build the wheels in a Docker container in GH actions for full manylinux compliance. That shouldn't be too hard, I'll take a look at it.

bminixhofer commented 3 years ago

Hi, can you please try this wheel:

nlprule-0.6.1_pre-cp38-cp38-manylinux2014_x86_64.zip

(rename .zip to .whl, otherwise Github doesn't let me upload it)

This should work for glibc 2.17 and up. You can just try running the code in the README, if it works, I'll do a release.

dvwright commented 3 years ago

Hi @bminixhofer that wheel installed! Thanks

Unfortunately, I am receiving an exception on load, tokenizer = Tokenizer.load("en") ; OSError: failed to fill whole buffer.

This doesn't seem related to glibc, so if you like I can close this case as fixed and and open a new one for the buffer issue?

bminixhofer commented 3 years ago

Great. That's actually not an issue. There are no binaries uploaded for the prerelease so .load can't work. But the error message wasn't propagated properly. I fixed that now, if you install this wheel:

nlprule-0.6.1_pre-cp38-cp38-manylinux2014_x86_64.zip

You should see something like ValueError: HTTP status client error (404 Not Found) for url (https://github.com/bminixhofer/nlprule/releases/download/0.6.1-pre/en_tokenizer.bin.gz) when running Tokenizer.load("en").

Instead, you'd have to manually download the binaries from https://github.com/bminixhofer/nlprule/releases/tag/0.6.0, unzip and load with tokenizer = Tokenizer("/path/to/en_tokenizer.bin"), same for the Rules. Loading the wheel works so I'm pretty sure this should work too in your setup. But would still be great if you could confirm it works.

dvwright commented 3 years ago

Hi @bminixhofer I installed the new wheel and am getting the error you stated.

I am also getting the error if I specify a local path to the binaries I downloaded.

ValueError: HTTP status client error (404 Not Found) for url (https://github.com/bminixhofer/nlprule/releases/download/0.6.1-pre/en_rules.bin_tokenizer.bin.gz)
    tokenizer = Tokenizer.load("en_tokenizer.bin")
dvwright commented 3 years ago

It looks like if I specify a local path, it appends it to the home repo url, rather then using a local resource,file://,

ValueError: HTTP status client error (404 Not Found) for url (https://github.com/bminixhofer/nlprule/releases/download/0.6.1-pre//tmp/en_tokenizer.bin_tokenizer.bin.gz)
--
tokenizer = Tokenizer.load("/tmp/en_tokenizer.bin")
bminixhofer commented 3 years ago

Hi, you have to use Tokenizer(...) not Tokenizer.load(...) to load it from a local file, .load just takes a language code. Sorry, there should be a better error message here as well.

dvwright commented 3 years ago

Confirmed. This is working for me, thanks!