crawling-sites Search Results

1000+ results
for crawling-sites

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

IQSS/dataverse-pm #23

Spike: Revisit the issue of adding rate limiting logic to th…

This ticket is a placeholder for general API rate and access limiting logic to better control the load placed on the service and provide options in case of system instability. Rate limiting was men…

kcondon updated 8 months ago
38
adbar/trafilatura #358

Restrictions on Web Crawling

Hi all, I was wondering if there were specific restrictions on web crawling certain sites? For example if one tried to web crawl Medscape: ```python from trafilatura.spider import focused_craw…

conceptofmind updated 1 year ago
2
PlummersSoftwareLLC/NightDriverStrip #609

Incoming wifi data stops drawing after a few minutes

## Bug report **Problem** I am sending led data over wifi from Unity. It will work great for 2-3 minutes then hang up. Through some debugging, I found that every time effectmanager.h NextE…

woodward54 updated 8 months ago
20
MarginaliaSearch/MarginaliaSearch #17

(crawler) Implement sitemap support

Sitemaps are currently not supported. Implementing sitemap support might help the crawler with URL discovery on some sites. There are some risks though. Some sitemaps are _huge_. Look at neocities'…

vlofgren updated 1 year ago
1
meeting-room-booking-system/mrbs-code #339

Serious Googlebot problem

My log files for MRBS systems sites have exploded to gigabytes in size in one day. They are filled with errors like this: [Mon Sep 11 09:12:26 2006] [error] [client 66.249.65.163] FastCGI: > ser…

jberanek updated 1 year ago
5
privacy-tech-lab/gpc-web-crawler #51

Identify sites that go to "Access Denied" or "Verify you are…

I've noticed that there are some sites that go to a page that says some iteration of "Access Denied" or "Verify you are a human." I think this is mostly caused by the VPN (i.e. the VPN IP address is b…

katehausladen updated 1 year ago
5
Yoast/wordpress-seo #11620

URL parameter cleansing

Query parameters and malformed URLs cause SEO headaches, even when canonical tags and crawling/indexing directives are used. From cache-misses to wasted crawl resources, to fragmentation and indexing …

jonoalderson updated 1 year ago
6
oqtane/oqtane.framework #2748

Sitemap caching for added performance with large sites

The sitemap could use caching to help with large sites with search engines crawling.

thabaum updated 1 year ago
5
RSS-Bridge/rss-bridge #2377

Automatic testing and enabling of working bridges

Because of the nature of the project, bridges _will_ break for various reasons. Sites change, rate limits set in place, IPs get blocked, paywalls appear, HTML change slightly and so on. The issue trac…

f0086 updated 1 year ago
1
zedeus/nitter #919

R.I.P. Nitter 🪦😭 (...unless?)

https://techcrunch.com/2023/06/30/twitter-now-requires-an-account-to-view-tweets/ the nitter crawler will need to be recreated...

devgaucho updated 6 months ago
510

上一页 1...74 75 76 77 78 79 80...100 下一页

1000+ results for crawling-sites

1000+ results
for crawling-sites