newspaper-crawler Search Results

79 results
for newspaper-crawler

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

scripting/feedBase #1

Friday work

This project is currently in two incompatible pieces. A database that contains subscriptions, and a UI that allows people to post OPML files to a server. I started the latter project in 2016, b…

scripting updated 6 years ago
29
mitchellkrogza/nginx-ultimate-bad-bot-blocker #335

Content scrapers not getting blocked.

Hi, I am using sucuri firewall and I was having issue with content scraper, I looked at my log and blocked some ip which were making a lot of request and it did stop the scraper for a day and next da…

RealSuprim updated 4 years ago
27
mendableai/firecrawl #78

Remove 'cookies' text when removing headers/footers, etc

Remove any cookies text when removing headers and footers. Many sites in Europe will display a cookie acceptance message Sometimes, this is the only text returned. Sometimes it captures something…

tractorjuice updated 4 months ago
5
palladius/gemini-news-crawler #2

Net::HTTPBadRequest error in demo04

Last night I did the demo04 and for some strange reason it STOPPED working. It was working 2 nights ago. I wonder if one of the many changes I did in the past 48h changed it. * this morning it was…

palladius updated 2 months ago
7
flairNLP/fundus #465

[Bug]: url_filter in PublisherSpec not filtering

### Describe the bug While working on #464 I had trouble filtering some regex in the url_filter of PublisherSpec. All unit tests are working fine but after testing the crawler myself I recognized …

Benjamin2107 updated 6 months ago
2
adbar/trafilatura #551

Scraping directly from wayback machine (newbie question)

Hi! Quite a newbie in the field, so maybe my questions are trivial. Trafilatura seems top-notch for my application but maybe I have some misunderstanding. I would like to extract all news from a …

scaramouche88 updated 7 months ago
6
Tribler/tribler #4719

phd placeholder: Accountability Protocol for Future Decentra…

**Working title**: Project *Noodles* A draft for layers for the accountability tools for the next generation applications. This is semi-layered architecture draft for better understanding of the …

grimadas updated 2 months ago
51
fhamborg/news-please #187

Adding Postgresql pipeline in config.cfg gives error "psycop…

**Mandatory** * [x] I read the documentation ([readme](https://github.com/fhamborg/news-please/blob/master/README.md) and [wiki](https://github.com/fhamborg/news-please/wiki)). * [x] I searched othe…

ghost updated 5 months ago
2
jivoi/awesome-osint #79

Add descriptions to the following

To get added to the official awesome list, descriptions must be added to the following: (_Please note that this is not a copy paste, only the ones without descriptions (about 95% of them)_) **Pl…

fosslinux updated 1 week ago
3
zeeguu/api #110

Crawler got stuck when running article_crawler.py

I am running the `article_crawler.py` to test if it works with the new sources using newspaper. When running the process for the danish sources, it got stuck in loading this article: https://www.dr…

tfnribeiro updated 8 months ago
5

上一页 1...1 2 3 4 5 6 7...8 下一页

79 results for newspaper-crawler

79 results
for newspaper-crawler