-
Hello
I was wondering if you could share the books corpus as the crawling takes pretty long. I was using it
https://github.com/soskek/bookcorpus
-
Googlebot crawling was generating a surprising degree of load and causing real problems as of late; in order to mitigate this load we've been experimenting with limiting or stopping bot access to the …
-
# TODO
- [x] glossary index
- [x] glossary single
- [ ] auto-interlinking
---
The `glossary` "module" is an easy way to pick up "information intent" traffic to your website, and backlin…
-
While crawling [Web Animations Level 2](https://drafts.csswg.org/web-animations-2/), the following links to other specifications were detected as pointing to non-existing anchors:
* [ ] https://draft…
-
-
Hello,
I was able to successfully use wivet when storing results in .dat files. I run into some issues when using sql database to store results. I have set up the wivet database, configured credenti…
-
Kavitakosh has changed their format of how they display title for their content. They now use author name and title of text. The crawling script needs to be fixed to separate author name and title.
-
By default Matomo does not track the bots (in the general sense on the Web: automatic agents doing some task, like crawling), but it is possible to [add this tracking](https://matomo.org/faq/new-to-pi…
Seb35 updated
2 months ago
-
I suspect performance when checking an entire site would be better with the ability to run the link checker on a set of pages provided by the sitemap when available, vs the recursive crawling process.…
-
Hi,
The pagination system used by Google on the search result pages has changed.
Your search method doesn't works anymore, it always returns de same 20 results. (the start and num GET parameters are …