DHBern / herbarium-frontend

This is a static catalogue of resoursces belonging to the Herbarium des Botanischen Gartens der Universität Bern
https://dhbern.github.io/herbarium-frontend/
0 stars 0 forks source link

IIIF ingest ("Picturae") #1

Closed pdaengeli closed 4 weeks ago

pdaengeli commented 1 month ago

Goal

Ingest of tif images from resstore.unibe.ch/boga_herbarium_bernense/Picturae to iiif.ub.unibe.ch.

Target structure:

image

Sample requests (collection slug + file name):

Process flow

Progress control

Fetching to NAS and restructuring

IIIF ingest

pdaengeli commented 1 month ago

The IIIF server isnn't coping well with the bags created so far, likely due to their size.

size ``` su -dh bogabag_00536-00568 525G bogabag_00536-00568 su -dh bogabag_00497-00536 632G bogabag_00497-00536 ```

image image

Thus adjusting the strategy:

The first try was successful and took ca 4 hours for just the IIIF ingest process (bogabag_0056)

image

pdaengeli commented 1 month ago
commands ``` # list remote files ssh dh@130.92.252.28 find /tmp/boga_herbarium_bernense/Picturae -name \'Sheet-0027*.tif\' -printf \'%f\\n\' | sort > comparison/tif-list-0027-remote.txt # list delivery files find ../_processed/bogabag_0028/data/herb-specs/ -name 'Sheet-0027*.tif' -printf '%f\n' | sort > comparison/tif-list-0028-local.txt # diff diff comparison/tif-list-0028-remote.txt comparison/tif-list-0028-local.txt > comparison/comparison-0028.txt # check diff cat comparison/comparison-0028.txt # number of remote files cat comparison/tif-list-0028-remote.txt | wc -l ```
pdaengeli commented 4 weeks ago

Completeness check

Result

A clean diff :)

Thus skipping remaining tests and closing here.


alternative approach iterate over remote list and query http status with a command such as ``` curl -s -o /dev/null -I -w "%{http_code}" https://iiif.ub.unibe.ch/image/v3/boga/Sheet-0000001.tif/info.json ```