-
@mfahlandt
Trying to write from the release crawler this week, I found it confusing. When it runs, it creates a file based on the prior week's date. Any reason not to change the file name to the…
-
When testing the endpoint running `python3 tests/test_timing.py`, sometimes nothing is returned. Try running the analyze topics and link extractor scripts in isolation, and check the outputs. I th…
-
Details -
```
$ docker build . -f gh_crawler/docker/Dockerfile -t static-scanner:latest
Sending build context to Docker daemon 387.1kB
Step 1/19 : FROM golang:1.21 as builder
.
.
.
Step …
-
WORKFLOW | Portal | Bot Message
-- | -- | --
BangaloreDutyFreeStore Workflow | PTC Wave 1 | Bot exception failedInput string was not in a correct format. at System.Number.Strin…
-
Good afternoon,
I've got a problem with YaCy user agent config where I'm setting up user agent like so:
```
crawler.userAgent.name=chimmieyacy
crawler.userAgent.string=chimmieyacy
```
in `…
-
**Describe the bug**
A clear and concise description of what the bug is.
2024年發現當年度的歷屆網頁中, 2021年的網頁圖片失效.
詳情請見 PR #44
**To Reproduce**
Steps to reproduce the behavior:
1. Go to "https://tw.py…
-
Since crawler 0.11.0 (https://github.com/webrecorder/browsertrix-crawler/pull/362), the captured favicon is available in pages.jsonl
We could use that when a custom favicon is not provided instead of…
-
**Describe the bug**
I have run Kendra Web Crawler and confirmed that the web crawl is successful, but the SNS (KendraCrawlerSNSTopic) that triggers the CrawlerLambda is not triggered.
https://githu…
-
Description:
Enhance the existing web crawler to support crawling and extracting content from websites that rely heavily on JavaScript for rendering their content. This feature will involve integra…
-
## Describe the bug
I'm using an arch distro which uses the newest stable release of python which is 3.12. With 3.12 distutils is now deprecated, so the program errors out for not finding distutils…