crawling-sites Search Results

1000+ results
for crawling-sites

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kodadot/nft-gallery #6798

Serve different landing for mobiles

This is more for research as we can serve different landing for mobile which is more tap friendly ![image](https://github.com/kodadot/nft-gallery/assets/5887929/9e79a2dd-000e-4e70-91f8-84dea9f893f…

yangwao updated 1 year ago
6
simplecrawler/simplecrawler #340

requestLatency limit

Can we make an option for setting the max allowed request latency? Some of the sites I am crawling are responding very slow. Like 100000ms. I would like to be able to set an upper limit for this.

ahansson89 updated 7 years ago
3
JustinBeckwith/linkinator #346

Crawl entire site from sitemap when available (faster than r…

I suspect performance when checking an entire site would be better with the ability to run the link checker on a set of pages provided by the sitemap when available, vs the recursive crawling process.…

antoniancu updated 3 years ago
2
biolds/sosse #3

Wayback Machine links completely break crawling

I'm having an issue where Wayback Machine links breaks crawling on completely unrelated pages [This page](https://windowsitter.world/index.php?p=wanted) has links to two Wayback Machine links, [this …

NinCollin updated 12 months ago
3
superseriousbusiness/gotosocial #776

[feature] Make `robots.txt` and `noindex` customizable

Right now, all GtS instances serve a simple hardcoded `robots.txt` that disallows all crawling: ``` User-agent: * Disallow: / ``` The code for this is here: https://github.com/superseriousbus…

tsmethurst updated 2 months ago
1
plone/volto #4638

sitemap.xml.gz needs to be split for Google

The sitemap.xml.gz file can only contain 50000 items and be a maximum of 50k. For sites with a lot of content, the current xml is rejected by Google. ``` Sitemap size limits: All formats limit a s…

reebalazs updated 1 year ago
2
BeaconCMS/beacon #219

Webfiles and Metatags: Generate `robots.txt`

Generate robots.txt for sites. Each site will have its own robots.txt which must be resolved dynamically by adding a route `/robots.txt` to https://github.com/BeaconCMS/beacon/blob/7790eb72769a026c…

AZholtkevych updated 2 months ago
3
DSpace/DSpace #9780

add configuration option to hide DSpace version number

## Change Request The (publicly available) REST endpoint `/server/api` reveals the number of the deployed DSpace version, e.g. ``` $ curl -sS 'https://demo.dspace.org/server/api' 2>&1 | grep 'd…

saschaszott updated 1 month ago
4
hackforla/data-science #44

Builtwith API Project

### Overview **Project:** Open Community Survey **Volunteer Opportunity:** Create scraper to get information from builtwith.com on technologies used by neighborhood council websites. Organize the …

ryanmswan updated 2 weeks ago
49
sjdirect/abotx #28

Javascript rendering - detecting window.location changes

What would be your recommended way of dealing with window.location changes on the page? I'm crawling sites that have a method that looks something like the following probably to break crawlers: ```…

replaysMike updated 2 years ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for crawling-sites

1000+ results
for crawling-sites