-
### Description
Many of the dependencies for this repo are out of date. A significant set of dependencies have reached EOL and are no longer supported. Update dependencies to current maintained v…
-
-
Create crawler for oppskrift.klikk.no
- [ ] Loop through id's and https_request them
- [ ] Do this until [n] URLs doesn't answer. 10 should do (50 in production?)
- [ ] create array of items/objects
-…
eklem updated
7 years ago
-
-
Currently, the Algolia index runs every 2 days at 17:00. This results in a lot of unnecessary reindexing, and users may be redirected to an invalid link when searching.
So we can reindex Algolia on…
-
**I just run the example provided. didn't change anything but got this error**
![image](https://github.com/Frikallo/stargazerz/assets/45900838/41f26c0d-36be-4fe8-83ae-f3f6920092d5)
-
LogCounterHandler increases crawler log_count stats for each record, but it should only increase them for logs from the crawler it is created by. This is an issue if you're running several Crawlers in…
-
It is to happen for datalad core in coming minor release
- https://github.com/datalad/datalad/pull/7575
which would break compatibility with datalad-crawler by removing `get_key_url` which operates …
-
When crawling a website and finding RSS links, the `` element should be taken into account, if present.
Example: redirects to the `/content/` path, and the `` is a relative link, but the site also…
p3lim updated
5 months ago
-
Hi all,
I've been experimenting with making an AWS lambda function for browsertrix-crawler and I've gone some distance but hit a snag that the maintainers are probably better equipped to help with.…