Closed kmcelwee closed 2 years ago
We have it working in puppeteer: https://gist.github.com/kmcelwee/cdbb6d2b4a5c2d9ac234d6de5db4716c
When running custom driver locally, we can use docker volumes to override defaulDriver.js
with our driver.
docker run -v $PWD/custom-driver.js:/app/defaultDriver.js -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --url https://derridas-margins.princeton.edu/library/abraham-oeuvres-completes-1966/gallery/front-cover/ --limit 2 --generateWACZ --text --collection deep-zoom
It will be easier to simply provide the extra info.json
as a seedlist to the scrape instead of including it in the custom driver.
@kmcelwee I think these are the relevant parts of ansible browsertrix role:
copy file — maybe just change the destination filename? https://github.com/Princeton-CDH/cdh-ansible/blob/main/roles/browsertrix/tasks/main.yml#L32-L37
crawl script — adjust command line argument https://github.com/Princeton-CDH/cdh-ansible/blob/main/roles/browsertrix/templates/crawl.sh.j2#L7
Use the
--driver
flag or customizerun.sh
to point the crawl to a custom JS file (inspired by the default: https://github.com/webrecorder/browsertrix-crawler/blob/main/defaultDriver.js). We need a custom driver that can interact with the visualization page and deep zoom.Notes from Ilya
Remaining Todos
/app/defaultDriver.js
with the custom driver or copy it to the server and point to it using the--driver
flag