LINNAE-project / SFB-Annotator

This web app is intended for demo purposes only!
https://www.research-software.nl/software/sfb-annotator
Apache License 2.0
1 stars 0 forks source link

Shortlist datasets for archiving at SDR #73

Closed arnikz closed 3 years ago

arnikz commented 3 years ago

Note: one fieldbook (=dataset) consists of multiple pages (= TIFF files).

related to #36

lisestork commented 3 years ago

MMNAT01_B4_F2_V3/PM (TIFF files):

Book Size Nr of files folder content
Mammals and birds 60M 3 NNM001001032/ birds and mammals cover of book
7.0G 239 NNM001001033/ chapter on mammals (e.g., https://trng-repository.surfsara.nl/deposit/900c341c1c10fff7/files/MMNAT01_PM_NNM001001033_001.tif)
5.6G 208 NNM001001034/ chapter on birds
Amphibians 68M 3 NNM001001035/ amphibians cover of book
5.4G 202 NNM001001036/ chapter on amphibians
6.3G 240 NNM001001037/ another chapter on amphibians
arnikz commented 3 years ago

N.B.: According to https://github.com/LINNAE-project/SFB-Annotator/issues/36#issuecomment-716502649 the files should be renamed (i.e., without _PM_).

lisestork commented 3 years ago

N.B.: According to #36 (comment) the files should be renamed (i.e., without _PM_).

Should we not keep the original file name, and map it to the _UniekId_digitalecollectie with a uniqueID property? Just wondering whether changing the name could lead to ambiguity, as there is a fixed folder and file name structure in the collection. @arnikz what do you think?

arnikz commented 3 years ago

Our selection: Mammals and birds / NNM001001033 (book / folder)

JPG images include the _AF_ while TIF images include the _PM_ in their file names; I thought, URLs should be independent of the file type(s) by removing these. See details below:

Text search at https://dh.brill.com/nco/ with:

According to

Example: page 1 URN: urn:cite:visualeditions:nco_nnm001001033_001 Permanent link: https://dh.brill.com/nco/view/nco_NNM001001033_001/makingsense Image URL: https://dh.brill.com/nco/f/MMNAT01_AF_NNM001001033_001.jpg IIIF info.json: https://iiif.arkyves.org/MMNAT01_AF_NNM001001033_001.jpg/info.json

N.B.: There is no _AF_ in URN or permalink.

Google Drive MMNAT01_B4_F2_V3 (:question:) -> NMM001001033 (sub-folder) -> MMMAT01_PM_NMM001001033_[001-239].tif

SDR Landing page (dataset): https://trng-repository.surfsara.nl/deposit/900c341c1c10fff7 DOI: 10.21945/SURF-trng.1f9b3206-559da01b EPIC PID: 21.T12996/SURF-trng.1f9b3206-559da01b Image URL: https://trng-repository.surfsara.nl/deposit/900c341c1c10fff7/files/MMNAT01_PM_NNM001001033_001.tif

IIIF server (local) info.json: http://localhost:8182/iiif/2/900c341c1c10fff7:MMNAT01_PM_NNM001001033_001/info.json TIF->JPG: http://localhost:8182/iiif/2/900c341c1c10fff7:MMNAT01_PM_NNM001001033_001/full/max/0/default.jpg

N.B.: We could use the _UniekId_digitalecollectie (file prefix without _PM_) to request an image (or info about it) from SDR and IIIF.

lisestork commented 3 years ago

SubfolderName Content Files Size MMNAT01_B1_F2_V3 Publications 4247 134G MMNAT01_B2_F2_V3 Sketches and Drawings I 2910 48G MMNAT01_B3_F2_V3 Sketches and Drawings II 191 7.6G MMNAT01_B4_F2_V3 Field Books and Correspondence I 19767 253G MMNAT01_B5_F2_V3 FIeld Books and Correspondence II 9229 92G

JPG + TIF

Focus on Field Books & Corr - > +/14600 files Sketches & Drawings -> +/