browsertrix-behaviors Search Results

openzim/zimit #433

Consider "new" crawler CLI arguments

We have some "new" (some are few months old ...) CLI argument of browsertrix crawler to consider: ``` --seedFile, --urlFile If set, read a list of seed urls, on …

benoit74 updated 1 week ago

webrecorder/browsertrix-behaviors #58

Behavior Bug: Instagram behaviour only opens the first post …

**URL** [https://www.instagram.com/elsdietvorst18](https://www.instagram.com/elsdietvorst18) **Describe the bug** Instagram behaviour only opens the first post of the row and ignores the two othe…

nvanderperren updated 2 months ago

brave/pagegraph-crawl #89

Porting some features from Browsertrix Crawler or integratin…

Recently came across the [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler) project which seems to be using Brave Browser for crawls. Some of its features include `Support for c…

L3thal14 updated 4 months ago

webrecorder/browsertrix-old #38

Gotchas

I'm using browsertrix to scrape a soon-to-be offline service at my university, and I wanted to share some gotchas I encountered. (I'll update this list when I encounter new issues.) ### Preserve se…

jswrenn updated 4 years ago

webrecorder/browsertrix-crawler #664

[question] Missing or timed out dynamic request to resource

If I crawl a website with mostly static resources, I'm noticing there can be missing resources in the resulting WARC. The reason for that is either broken links or timeouts. I have written tools to…

wsdookadr updated 2 months ago

webrecorder/browsertrix-crawler #486

Make screenshot after custom behaviors

Currently it seems screenshot are made before custom behaviors. It could be very interesting to be able a post-custom behaviors screenshot. For example to capture screenshot after removing the "acc…

cmillet2127 updated 4 months ago

openzim/zim-requests #998

bibnum_fr_all is failing

### Recipe URL https://farm.openzim.org/recipes/bibnum_fr_all ### Last log lines ```true ---------- Testing warc2zim args Running: warc2zim --favicon=https://drive.farm.openzim.org/Corrected%20Lo…

benoit74 updated 5 months ago

ukwa/webrender-api #9

Replace with browsertrix-crawler

Rather than our own `webrender-api`, consider switching to https://github.com/webrecorder/browsertrix-crawler The integration pattern is somewhat different to Browsertrix's primary use case, but it…

anjackson updated 3 years ago

sul-dlss/was-pywb #236

Vimeo videos not replayable in SWAP

Here is the druid with wacz file: https://argo.stanford.edu/view/druid:bc725wm6775 The seed in SWAP https://swap.stanford.edu/was/20240118154547/https://eastwindezine.com/ You can find Vimeo …

peterchanws updated 9 months ago

anjackson/golem #17

Add browser-based crawler mode

The [scrapy-playwright](https://github.com/scrapy-plugins/scrapy-playwright) project appears well supported and can supersede the current Selenium Hub approach (see e.g. [proxy support](https://github…

anjackson updated 2 years ago

89 results for browsertrix-behaviors

89 results
for browsertrix-behaviors