AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
504 stars 51 forks source link

not working for gnews.org #576

Closed AndyTheFactory closed 1 year ago

AndyTheFactory commented 1 year ago

Issue by Jooey233 Mon Apr 3 16:20:52 2023 Originally opened as https://github.com/codelucas/newspaper/issues/968


https://gnews.org/articles/1068907 used article.text for this page, and no text got. and build for gnews is not working too.

import newspaper

gnews = newspaper.build('https://gnews.org/', language='zh')

article = gnews.articles[0]
article.download()
article.parse()
print(article.text)

it will give u a 'index out of range'

and this:

import newspaper

page = newspaper.Article('https://gnews.org/articles/1065912')
page.download()
page.parse()
print(page.title)
print(page.text)

Only part of the title is caught, the text is not working at all

AndyTheFactory commented 1 year ago

Site has Cloudflair protection. it returns status_code 200, so it looks "ok" for the downloader...