-
## Update
This issue has been deprecated in favor of https://github.com/diffblog/hacktoberfest/issues/10. Please try #10 instead
----
👋👋 Hello Hacktoberfest contributor
We want your hel…
-
## Update
This issue has been deprecated in favor of https://github.com/diffblog/hacktoberfest/issues/10. Please try #10 instead
----
👋👋 Hello Hacktoberfest contributor
We want your help …
-
*// Ok, this is kind of exciting; it is the first issue **Data Consistency Crawler found** for us* 👍
We witnessed in some cases, multiple values ("d","o") for the Type system field:
```
Inconsist…
-
Is it possible to force crawler to stop its crawling. I have condition that only 500 pages should be crawled when that condition is met ti want to stop this crawler
-
### Browsertrix Version
v1.9.4-08ee857
### What did you expect to happen? What happened instead?
After the last opgrade to 1.9.4 the ads are not shown any more in replay for tv2.dk even thoug…
-
### Problem Description
I think it would be valuable to have an option to avoid duplicate crawls across runs. E.g., check an index to see if the given url has already been crawled - if so, don't …
-
Name of Crawler: ???
Settings:
- url: www.spiegel.de
- blacklist:
- sport
- dienste
- extra
- netzwelt
- karriere
- reise
- stil
- international
- follow subdomai…
-
## 概要(Overview)
ランダムで落ちるテストを記載しておきます。
```
1) EF08InvoiceCest: EF0801-UC01-T01_商品購入_税額確認
Test codeception/acceptance/EF08InvoiceCest.php:invoice_商品購入_税額確認
[Facebook\WebDriver\Exception…
-
### Context
Prior reading: https://anjackson.net/2023/06/09/what-makes-a-large-website-large/
> The simplest way to deal with this risk of temporal incoherence is to have two crawls. A shallow a…
-
Crawler will automate Athena to update tables after logs sync'd from fileserver to S3.
Review example here: https://www.mikulskibartosz.name/start-glue-crawler-using-boto3/#:~:text=AWS%20gives%20…