codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.06k stars 2.11k forks source link

Only article date gets extracted? #777

Open edoardobassett opened 4 years ago

edoardobassett commented 4 years ago

Hello, I tried running the sample code, on the example URL, but I keep getting empty arrays for text and authors.

This is the code:

from newspaper import Article url ="http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/"

a = Article(url, language = 'it') a.download() a.parse() print(a.text)

johnbumgarner commented 3 years ago

Newspaper3k makes a best effort to extract these data elements from known tags, but something you need to extract these data elements using a different method within Newspaper3k.

I recently started putting together a detailed Newspaper3k usage document that I'm publicly sharing. This document is available here: https://github.com/johnbumgarner/newspaper3_usage_overview. It contains the extraction code for Fox News sources.

P.S. this document is a work in process, so more information will be added.