-
This project is currently in two incompatible pieces.
A database that contains subscriptions, and a UI that allows people to post OPML files to a server.
I started the latter project in 2016, b…
-
Hi,
I am using sucuri firewall and I was having issue with content scraper, I looked at my log and blocked some ip which were making a lot of request and it did stop the scraper for a day and next da…
-
Remove any cookies text when removing headers and footers.
Many sites in Europe will display a cookie acceptance message
Sometimes, this is the only text returned.
Sometimes it captures something…
-
Last night I did the demo04 and for some strange reason it STOPPED working. It was working 2 nights ago.
I wonder if one of the many changes I did in the past 48h changed it.
* this morning it was…
-
### Describe the bug
While working on #464 I had trouble filtering some regex in the url_filter of PublisherSpec.
All unit tests are working fine but after testing the crawler myself I recognized …
-
Hi!
Quite a newbie in the field, so maybe my questions are trivial.
Trafilatura seems top-notch for my application but maybe I have some misunderstanding.
I would like to extract all news from a …
-
**Working title**: Project *Noodles*
A draft for layers for the accountability tools for the next generation applications.
This is semi-layered architecture draft for better understanding of the …
-
**Mandatory**
* [x] I read the documentation ([readme](https://github.com/fhamborg/news-please/blob/master/README.md) and [wiki](https://github.com/fhamborg/news-please/wiki)).
* [x] I searched othe…
ghost updated
5 months ago
-
To get added to the official awesome list, descriptions must be added to the following:
(_Please note that this is not a copy paste, only the ones without descriptions (about 95% of them)_)
**Pl…
-
I am running the `article_crawler.py` to test if it works with the new sources using newspaper.
When running the process for the danish sources, it got stuck in loading this article: https://www.dr…