codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.19k stars 2.12k forks source link

Import problems on Beanstalk #429

Open several27 opened 7 years ago

several27 commented 7 years ago

Hi all, Has anyone tried using this great library on AWS Elastic Beanstalk Python 3.4. I'm getting a strange error, despite images.py existing in the newspaper directory.

Traceback (most recent call last):
File "/opt/python/current/app/application.py", line 12, in <module>
     from server.helper import get_connection, requires_auth, validate_schema, error_handler
   File "/opt/python/current/app/server/helper.py", line 13, in <module>
     from database.user import User
   File "/opt/python/current/app/database/user.py", line 11, in <module>
     from .post import Post
   File "/opt/python/current/app/database/post.py", line 9, in <module>
     from newspaper import Article
   File "/opt/python/run/venv/lib/python3.4/site-packages/newspaper/__init__.py", line 10,in <module>
     from .api import (build, build_article, fulltext, hot, languages,
   File "/opt/python/run/venv/lib/python3.4/site-packages/newspaper/api.py", line 14, in <module>
     from .article import Article
   File "/opt/python/run/venv/lib/python3.4/site-packages/newspaper/article.py", line 14, in <module>
     from . import images
 ImportError: cannot import name 'images'

I have installed all dependencies (Pillow, lxml, libjpeg etc.) correctly and surprisingly when I ssh to instance it works properly.

Any help would be appreciated, thanks!

jebudas commented 7 years ago

I'm receiving the same error, keep me posted if you find a solution and I'll do the same...

codelucas commented 7 years ago

Can you try opening up a shell in your AWS Beanstalk instance and importing thw images module from newspaper yourself and seeing what happens?

codelucas commented 7 years ago

from newspaper import images

jebudas commented 7 years ago

Thanks for digging into this, Lucas...

$ python3 >>> from newspaper import images

This works from python3 prompt, but same error even if I change to this in article.py.

jebudas commented 7 years ago

Got it working! I had to add these to certain .ebextensions files:

packages: yum: libjpeg-turbo-devel: [] libxslt-devel: [] libxml2-devel: []

04_setup_newspaper: command: mkdir -p /home/wsgi/.newspaper_scraper/memoized && chmod 755 /home/wsgi/.newspaper_scraper/memoized

codelucas commented 7 years ago

Very cool, nicely done getting it to work @jebudas! I not familiar with .ebextensions or beanstalk installation process. Weird that you needed to run chmod yourself too.

Hmm do you think you can help in adding an installation section for AWS Elastic Beanstalk so other beanstalk users can avoid the same problems you ran into? 👍 🙇

jebudas commented 7 years ago

Well, I'm not out of the woods yet!! Just ran into some new permission errors... I'll come back around when we're at 100%. Peace...

jay1803 commented 7 years ago

I have the same issue too, I think it’s maybe because EB use a different user to run application. The default user of EB is ec2-user, so when run shell in AWS, it’s ec2-user Is run application, so the application has permission to create .newspaper_scrape, and the folds in it.

and shell load site-package from different path, which is

/opt/python/run/venv/local/lib/python3.4/site-packages/

And when you actually run the application, site-package loaded from

/opt/python/run/venv/lib/python3.4/site-packages/

Just like I saw in logs. I think maybe it’s because the user actually run application is wsgi, not ec2-user, so I have to mkdir for 3 sub folders not only memorized in .newspaper_scraper.

Maybe you should move the .newspaper_scraper to the application root folder instead of system user folder.