-
This is more for research as we can serve different landing for mobile which is more tap friendly
![image](https://github.com/kodadot/nft-gallery/assets/5887929/9e79a2dd-000e-4e70-91f8-84dea9f893f…
-
Can we make an option for setting the max allowed request latency? Some of the sites I am crawling are responding very slow. Like 100000ms. I would like to be able to set an upper limit for this.
-
I suspect performance when checking an entire site would be better with the ability to run the link checker on a set of pages provided by the sitemap when available, vs the recursive crawling process.…
-
I'm having an issue where Wayback Machine links breaks crawling on completely unrelated pages
[This page](https://windowsitter.world/index.php?p=wanted) has links to two Wayback Machine links, [this …
-
Right now, all GtS instances serve a simple hardcoded `robots.txt` that disallows all crawling:
```
User-agent: *
Disallow: /
```
The code for this is here: https://github.com/superseriousbus…
-
The sitemap.xml.gz file can only contain 50000 items and be a maximum of 50k. For sites with a lot of content, the current xml is rejected by Google.
```
Sitemap size limits: All formats limit a s…
-
Generate robots.txt for sites.
Each site will have its own robots.txt which must be resolved dynamically by adding a route `/robots.txt` to https://github.com/BeaconCMS/beacon/blob/7790eb72769a026c…
-
## Change Request
The (publicly available) REST endpoint `/server/api` reveals the number of the deployed DSpace version, e.g.
```
$ curl -sS 'https://demo.dspace.org/server/api' 2>&1 | grep 'd…
-
### Overview
**Project:** Open Community Survey
**Volunteer Opportunity:** Create scraper to get information from builtwith.com on technologies used by neighborhood council websites. Organize the …
-
What would be your recommended way of dealing with window.location changes on the page? I'm crawling sites that have a method that looks something like the following probably to break crawlers:
```…