-
First, let me say awesome work, awesome project! I've replaced newspaper3k with trafilatura and am not looking back.
Here's a maybe interesting use case where it fails
https://www.protocol.com/pr…
-
@codelucas , Any solutions to this ?
>>> from newspaper import Article
>>> url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
>>> article = Article(url)
>>> a…
-
- [x] I am on the [latest](https://github.com/python-poetry/poetry/releases/latest) Poetry version.
- [x] I have searched the [issues](https://github.com/python-poetry/poetry/issues) of this re…
-
If `newspaper3k` cannot extract article from URL, need to be a way to make article by hand from node innerHtml. Scenario:
1. user open URL in browser and waits until content will be fully downloaded
…
-
**Describe the bug**
Displaying web maps/feature layer views/feature layers using an API Key results in a token error. We created a bunch of web maps and feature layer views. Unluckily, we cannot dis…
-
This error is observed while importing extruct
```
import extruct
File "/usr/local/lib/python3.9/dist-packages/extruct/__init__.py", line 4, in
from extruct.rdfa import RDFaExtractor
…
-
Pillow is raising a ValueError and Newspaper is printing it. It should instead be raised for the exception handler on images:153 so the user knows what's happening (i.e. what url failed, etc..).
```…
adamn updated
3 years ago
-
Some links are excluded from the IA, e.g.,
```
2021-08-02 09:45:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response : HTTP status code is not handled or not allowed
2021-08-02 09:45:…
-
#### Target objective:
nlp_architect command fails if dev req aren't installed
detect is libraries required are install and print a msg that states the error + how to install
#### Steps to objectiv…
-
parse_html endpoint should test if the received HTML is actually an article, and send an error response if it is not.
Maybe something like "Viewed page is not an article."
parse_html endpoint shou…