codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

https://goo.gl/VX41yK

MIT License

14.06k stars 2.11k forks source link

Running on Fedora #225

Closed simonedu closed 8 years ago

simonedu commented 8 years ago

We have a program in Python 3 using your package that runs well in Ubuntu, but when we try to run it in Fedora, it returns nothing. I followed the installation guide to the letter and the toolkit installed completely.

What do you suggest we do to solve this problem.

Thank you!

yprez commented 8 years ago

Can you provide more details?

Like:

What Python version are you running?
How did you install newspaper? (please provider output)
What is the code you're running and what's the output?

simonedu commented 8 years ago

What Python version are you running?

Python 3.4.3

How did you install newspaper? (please provider output)

915 sudo yum install libxml2-dev libxslt-dev 916 sudo yum install libjpeg-dev zlib1g-dev libpng12-dev 917 sudo yum install libjpeg-dev 918 sudo yum install libjpeg 919 sudo yum install libjpeg 920 sudo yum install zlib1g 921 sudo yum install libpng12 922 sudo yum install libxml2 923 sudo yum install libxslt 926 sudo yum install zlib1g 927 sudo yum install zlib* 931 pip3 install newspaper3k 932 sudo pip3 install newspaper3k 933 curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python3

What is the code you're running and what's the output?

extracts the news text from an html of a given news url

def text_extractor(url):

try:
    article = Article(url)
    article.download()
    article.parse()
    text = article.text

except:
    text = ''

return text

simonedu commented 8 years ago

Output: No text is extracted.

yprez commented 8 years ago

Can you remove the try/except to see if any exceptions are raised?

simonedu commented 8 years ago

No exceptions. Same result: no text

yprez commented 8 years ago

That's weird... any specific urls it's failing on?

simonedu commented 8 years ago

The same url has no problem on the other operating system (Ubuntu), hence it cannot be the url.

  From: Yuri Prezument <notifications@github.com>

To: codelucas/newspaper newspaper@noreply.github.com Cc: simonedu simonedu@yahoo.com Sent: Thursday, March 10, 2016 4:19 PM Subject: Re: [newspaper] Running on Fedora (#225)

That's weird... any specific urls it's failing on?— Reply to this email directly or view it on GitHub.

yprez commented 8 years ago

Works for me on Fedora:

$ cat /etc/fedora-release 
Fedora release 23 (Twenty Three)

$ python
Python 3.4.3 (default, Jun 29 2015, 12:15:26) 
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux
>>> import newspaper
>>> article = newspaper.Article(url='https://www.opera.com/blogs/desktop/2016/03/native-ad-blocking-feature-opera-for-computers/')
>>> article.download()
>>> article.parse()
>>> article.text
'If there were no bloated ads, some top websites would load up to 90% faster.\n\nToday, w...'

This can be anything... installation issue, parsing issue on a specific url, maybe you used an earlier version of newspaper or a dependency when you installed on Ubuntu, etc...

simonedu commented 8 years ago

Thank you very much.

yprez commented 8 years ago

@simonedu did you get it to work?

simonedu commented 8 years ago

Yes, your program worked perfectly. Thank you.

yprez commented 8 years ago

Great!