-
Nous utilisons 2 crawlers :
* Le premier en utilisant une API "scraperApi" qui permet de trouver un résultat d'une requête Google.
* Le deuxième en crawlant sur chaque site web qui semblent im…
-
If for some resources the crawler encounters a ZIM file on a web property, we should immediately block it so that it is not included inside the WARC and then inside the ZIM.
This is probably a page…
-
Instead of having to import JSON file with import type assertion (which unexpectedly breaks stuff):
```
import crawlers from 'crawler-user-agents' with { type: 'json' }
```
... could we have an `.…
-
https://github.com/mozilla/coverage-crawler/commit/f164a6de4a961277cb4006c6939290526bf5c955
https://github.com/mozilla/coverage-crawler/commit/7e4218a61d06da26477b6d740cd643f1808d29db?w=1
-
Since crawler 0.11.0 (https://github.com/webrecorder/browsertrix-crawler/pull/362), the captured favicon is available in pages.jsonl
We could use that when a custom favicon is not provided instead of…
-
https://gitlab.com/fdroid/rfp/-/issues/?sort=created_date&state=all
[Issues](https://gitlab.com/fdroid/rfp/-/issues/?sort=created_date&state=all) in the crawler f-droid. Extract `source code` url
…
-
try to fix with
```
require 'rubygems'
require 'mechanize'
module PortfolioAdvisor
module Crawler
# Display content of HTML articles
class Api
attr_reader :url
def …
-
GDG CT forum is such an example: forum.gdgcatania.org
-
## Summary
Allow passing parameters to a signal receiver (when self is not available)
I.e.
```
crawler.signals.connect(receiver=cls.engine_stopped, signal=signals.engine_stopped, cb_kwargs={…
-
I upgrade the capacities of the crawler and enhance the ui