-
## Meeting Details
* **Date/Time:** June 14, 2022 @ 1800 UTC / 11:00am PT1800 UTC / 11:00am PT
* **Location:** [Discord SIG-Docs Voice Room](https://discord.gg/p3padwr58u)
* **Moderator:** @Finit…
-
It is useful to tell folks what they need to do, probably starting with:
```
% git clone https://github.com/spider-rs/spider.git
cd spider
```
Easy to do get some results by:
`cargo run --exa…
-
The following line extracts `domain` as `c[1]`
https://github.com/dibollinger/CookieBlock-Crawler-Prototype/blob/801cb332cdcdddaea4e57e2ac2889653ed623071/src/cookiebot_scraper.py#L250
However, fro…
-
The crawler is now working with expected functionality. I need to test its performance (i.e., how many sites are successfully analyzed) with a larger testing sample.
-
## Feature description
We can create a sample site that gets a build of the AMP plugin for any given pull request and then check that the generated AMP page does not degrade performance. The pull r…
-
For url = "https://email.gov.in/"
It gives the following output which shows the issue:
> Next URL: https://email.gov.in#
> Words: 2583
> Next URL: https://email.gov.in/videos/docs/How-To-Use-…
-
- [x] CLI Client using goflags
```yaml
Usage:
./katana [flags]
Flags:
INPUT:
-u, -list string[] target url / list to crawl (single / comma separated / file input)
CONFIGURATIONS:
…
-
Hi,
From time to time I get the following error and the crawl crashes when the dump_profile command is executed (see below). It happens after several site visits.
```
profile_commands - WARN…
-
### Description of problem
I am aware that CentOS/RHEL/Enterprise/Rocky is not officially supported, but it is simply not operational.
Part of the issue is `herokuish`, wherein the post-install sc…
-
Wie [im Forum ersichtlich](https://forum.mediathekview.de/post/30400) stimmt bei einigen SRF-Sendungen die URL nicht und besteht einzig aus "index-f1-v1-a1.m3u8".
Zurzeit betrifft das rund 600 Send…