-
It would be ideal if sites' privacy policies were more discoverable to users, their agents, and to crawlers. To that end, I'd suggest that we:
1. Pave the `rel=privacy-policy` cowpath (based on HT…
-
```
Problem description:
Unexplained spike in direct traffic since May 19th (+118,57%). Extra visits
only generated by users that visited the website through Internet Explorer. The
quality of th…
-
# Assume all links are NSFW (unless marked SFW).
## Commenters: Please mark your NSFW links anyway.
Please open a new issue if you're reporting a problem in the ripper for a site we already supp…
-
BASE, openarchives, and others have a listing of their "sources". I plan to write a script which aggregates all of these into a single list.
-
Hello,
I have these settings
- WP site: www.dynpage.com
- Static version: www.statpage.com
I have the need to rewrite a specific url in the static version of the wp site.
I added this rewri…
-
## Description
I'm using TS Cloud to do some RAG on text and code extracted from crawled HTML.
The docs for conversational RAG mention a 3000 token limit:
> Context Window Limits
>
> Alth…
-
```
Problem description:
Unexplained spike in direct traffic since May 19th (+118,57%). Extra visits
only generated by users that visited the website through Internet Explorer. The
quality of th…
-
The verbatim name "Monomorium monomorium group" in the scientificName field of the 'IZIKO museum collections dating from 1800 - 2013' dataset[1], had been interpreted to read "Monomorium monomorium Bo…
-
It'd be great if the plugin can be configured that it'll use/re-use the sessions mechanism.
Because managing it in spiders like that:
```
if 'X-Crawlera-Session' in response.headers and resp…
-
I have a problem with a theme which loads some Javascript files dynamically. Also they.. re-generate those JS files on cache clear, the name of those files is changing then. So i have no chance to add…