chrisspen / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
36 stars 7 forks source link

No module named 'punctuator.punc'; 'punctuator' is not a package #3

Open wcooper90 opened 4 years ago

wcooper90 commented 4 years ago

I'm currently trying to create a webapp, Punctuator being an important package for it. I'm using AWS, which is "a distribution that evolved from Red Hat Enterprise Linux (RHEL) and CentOS," but I'm not sure about specifics. I'm on Python 3.7.9, and these are the errors that come out -

~ File "/var/app/venv/staging-LQM1lest/bin/punctuator.py", line 5, in ~ from punctuator.punc import command_line_runner ~ ModuleNotFoundError: No module named 'punctuator.punc'; 'punctuator' is not a package

I installed puncuator 0.9.6 into the virtual environment venv via a requirements.txt file off of github, with the following command:

sudo pip3 install -r https://raw.githubusercontent.com/wcooper90/summarization/master/backend/requirements.txt

I also have Punctuator installed on Amazon Linux 2 with just pip3 install puncuator.

I'm wondering if there are some dependency issues, or if it may have to do with the OS?

Thanks for any help.

chrisspen commented 4 years ago

How are you calling it?

wcooper90 commented 4 years ago

Hi Chris,

We've tried: text = pytesseract.image_to_string(img).encode('latin-1', 'ignore')

As well as executing from the command line and then reading it from a file:

os.system("tesseract -l eng /var/app/current/inputs/" + str(i) + ".png text")

Thanks for getting back so quickly.

chrisspen commented 4 years ago

I meant how are you calling punctuator. That code only appears to call tesseract.

wcooper90 commented 4 years ago

Sorry!

Here is the function we are using punctuator in:

def punctuate_transcript(text):

try different sample models in punctuator -- period accuracy is most important (especially for summary)!

p = Punctuator('Demo-Europarl-EN.pcl')
return p.punctuate(text)

We import Punctuator at the top of the file with:

from punctuator import Punctuator

and I've made sure to download the model, Demo-Europarl-EN.pcl, to the right place, both locally and on AWS.

chrisspen commented 4 years ago

I meant a complete script to reproduce the issue. Try this:

cd /tmp
mkdir test
cd test
virtualenv -p python3.7 env
pip install punctuator
python
>>> from punctuator import Punctuator

Does that throw an import error?

wcooper90 commented 4 years ago

With or without the virtualenv, it does not throw an import error. Do you think we can use the os package to run Punctuator in Python from the command line within our application?

chrisspen commented 4 years ago

I'm not sure I understand your question. If you mean calling punctuator via os.system(), I suppose that could work, but that's a complicated workaround to what should be a simple problem to fix.

If your application is running inside the virtualenv where punctuator is installed, it'll just work and you should need to call punctuator it via os.system. It looks like it's throwing an import error because you simply haven't installed punctuator.

If you're somehow calling punctuator from Python running os.system("tesseract..."), then you need to make sure that Python instance is inside the virtualenv where punctuator is installed. Then the process called from os.system should inherit the path.

Bacus96 commented 3 years ago

I've been having the same issue when importing it via python with from punctuator import Punctuator . I attempted to install it as you suggested above, and then run that command in Python but it results in the error that's mentioned earlier in this thread. Help would be appreciated since it would be great to check this out

0xhamachi commented 3 years ago

I'm having the same problem

chrisspen commented 3 years ago

If someone could provide a script that reproduces the problem, then I could probably fix it. However, I can find no problems on my end. I even have a Travis build that installs the package and runs some unittests.

Closing this as not-reproducible, but feel free to re-open if you can document steps to reproduce.

evios commented 3 years ago

Hi! Happy NY and Merry Christmas :) To replicate you may run anything except python in CMD (uvicorn, celery, etc). If you run python from CMD - everything fine.c

Dockerfile: FROM python:slim RUN pip3 install punctuator fastapi uvicorn COPY main.py ./app/main.py CMD uvicorn --host 0.0.0.0 app.main:app

app/main.py: from punctuator import Punctuator

from fastapi import FastAPI app = FastAPI() @app.get("/") async def versions(): return "something"

RUN docker build . -t punctuator
docker run -ti punctuator # error will occur on load

Error occured File "./app/main.py", line 1, in from punctuator import Punctuator File "/usr/local/bin/punctuator.py", line 5, in from punctuator.punc import command_line_runner ModuleNotFoundError: No module named 'punctuator.punc'; 'punctuator' is not a package

chrisspen commented 3 years ago

@evios Thanks. I can reproduce this. I can also reproduce this if I use a normal venv in Ubuntu. However, it seems to be a bug in uvicorn, not this package. That's why I couldn't reproduce this earlier, as I was only testing with a normal Python shell.

If I add import sys; print(sys.path) to my __init__.py and then run your uvicorn code, I see:

['.', '/home/chris/git/punctuator2/test/.env37/bin', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/home/chris/git/punctuator2/test/.env37/lib/python3.7/site-packages']

However, if I run a normal Python shell and then do the same import, I see:

['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/home/chris/git/punctuator2/test/.env37/lib/python3.7/site-packages']

So for some odd reason, it looks like uvicorn is adding the standard bin directory as a place to look for packages, and this is breaking because I have a bin script with the same name as the package. So it tries to import the bin script, which obviously isn't a package, causing the ModuleNotFoundError.

I don't think this behavior in uvicorn is correct. It should not be looking for packages in the virtualenv's bin directory. Therefore, I don't think there's anything I can do on my end, short of changing my names to conform to uvicorn's non-standard behavior, which isn't good practice.

Correct me if I'm wrong.

chrisspen commented 3 years ago

Also, as a workaround, if you remove the bin directory from sys.path before you import punctuator, that should fix it.

zxl777 commented 3 years ago

Thanks @chrisspen ,It worked.

import sys
sys.path.remove('/root/miniconda3/envs/xxx/bin')
import punctuator
evios commented 3 years ago

Hi:) Also can confirm that removing bin dir fixed. I noticed such strange behaviour not only for uvicorn, but for celery as well. uvicorn --host 0.0.0.0 app.main:app celery worker -A app.worker As you can see, while this is python you cant run then directly as binary packages. Hence, one more workaround (if you run it in Docker) is to start (uvicorn, celery) with: CMD python -m uvicorn --host 0.0.0.0 app.main:app instead of CMD uvicorn --host 0.0.0.0 app.main:app

In such run scenario everything good. Thank you @chrisspen for packaging it in pip! Have a great day!

ghost commented 3 years ago

Hi, I'm very interested in using Punctuator but my configuration skills are not up to fixing the import work-around mentioned in the previous posts.

I have these system paths:

['/mnt/c/PythonProgrammes/venv', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']

(running Python 3.7 on Ubuntu 18.04 LTS)

I have tried to removing '/mnt/c/PythonProgrammes/venv' with:

sys.path.remove('/mnt/c/PythonProgrammes/venv')

But my installed_packages_list does not include punctuator.

Any help appreciated.