YaleDHLab / voynich

Analyzing the Voynich Manuscript with computer vision
https://github.com/YaleDHLab/voynich/projects/1
7 stars 1 forks source link

Collect British Library collection herbal images #2

Open chirila opened 5 years ago

chirila commented 5 years ago

http://www.bl.uk/manuscripts/BriefDisplay.aspx (e.g. search for illustrated herbal). Can this be scraped?

duhaime commented 5 years ago

@chirila for some reason when I try to request that url I get a "your session has expired" page (seems they store user session data on the server, not the url). Would you be able to send a little info on which images in the BL look good? If there's a search to run or collection name, that should be perfect.

There's a repository with open BL data--fingers crossed these images are in one of those collections!

chirila commented 5 years ago

oh, that's annoying. Here's the url of a sample ms http://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Egerton_MS_821&index=5 I can pull out the refs (e.g. Egerton_MS_821) for likely books. I just went to the main search page and pulled up all illustrated herbals from 850-1475 (200 results but not all relevant)

On Thu, Apr 11, 2019 at 2:11 PM Douglas Duhaime notifications@github.com wrote:

@chirila https://github.com/chirila for some reason when I try to request that url I get a "your session has expired" page (seems they store user session data on the server, not the url). Would you be able to send a little info on which images in the BL look good? If there's a search to run or collection name, that should be perfect.

There's a repository https://github.com/BL-Labs/imagedirectory with open BL data--fingers crossed these images are in one of those collections!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/YaleDHLab/voynich/issues/2#issuecomment-482236758, or mute the thread https://github.com/notifications/unsubscribe-auth/AP8oR1_kGIbOK01EUohTrfCh0B_U4ZUhks5vf3q1gaJpZM4cqbaO .

--

Claire Bowern Professor, Director of Graduate Studies Chair: Yale Women Faculty Forum (wff.yale.edu) Department of Linguistics New Haven, CT 06511

duhaime commented 5 years ago

Interesting. It looks like they serve those page images through a depth tile engine (open sea dragon), which partitions the image into a tile grid. Here's a sample tile piece: http://www.bl.uk/manuscripts/Proxy.ashx?view=egerton_ms_821_fs001r_files/11/0_4.jpg One can certainly stitch those tiles back into composite images, but that would involve quite a few requests. I'll see if I can find these images in the BL open data repository first...

duhaime commented 5 years ago

@chirila It seems the tenth layer zoom tiles use just six images per page, which could be reasonable to fetch. If you can post links to the volumes of interest (or send a query that returns the volumes of interest) I'm happy to fetch each!

chirila commented 5 years ago

First priority egerton ms 747 egerton ms 2020 # high priority add ms 8928 #apuleius, italian, outline drawings in margins add ms 41623 #codex bellunensis, dioscorides add ms 22332 add ms 22333 harley ms 1585 harley ms 3736 harley ms 4986 harley ms 5294 sloane ms 4016 sloane ms 1975 # miniatures cotton ms vitellius c iii

Second priority

Add ms 5025 #illustrated but not many plants sloane ms 475 # medical texts, recipes, charms (not illustrated?)

harley ms 3469 # alchemical miniatures harley ms 4751 # beastiary burney ms 275 # for mathematical and other illustrations, not plants.

Third priority/other

sloane ms 345 # not illustrated sloane ms 2839 Add ms 10302 # ordinal of alchemy add ms 8785 #right subject, not much illustration add ms 60577 # unillustrated? arundel ms 225 # unillustrated? but diet information harley ms 80 # right subjects but not illustrated harley ms 1602 # right subject but not illustrated harley ms 1735 # commonplace book; illustrations not obvious? harley ms 2407 # not illustrated? harley ms 2558 harley ms 4486 # right subject, not illustrated harley ms 2320 harley ms 6528 b harley ms 3353 harley ms 585 royal ms 12 d xvii royal ms 12 E xxiii egerton ms 821 # highly important as an example of possible text, right subjects, etc, but not illustrated.

chirila commented 5 years ago

http://www.bl.uk/catalogues/illuminatedmanuscripts/record.asp?MSID=8792&CollID=9&NStart=1975 seems to be an alternative view of at least some of the mss, and has thumbnails

chirila commented 5 years ago

http://www.bl.uk/catalogues/illuminatedmanuscripts/record.asp?MSID=8320&CollID=28&NStart=2020

http://www.bl.uk/catalogues/illuminatedmanuscripts/record.asp?MSID=7673&CollID=27&NStart=60577

Coll ID 28 is Egerton Coll ID 27 is Additional

chirila commented 5 years ago

Copyright/Reproduction: https://www.bl.uk/catalogues/illuminatedmanuscripts/repro.asp Can store images for personal use but can't re-share them. So, we can do the research and show examples in academic lectures and seminars, but we can't illustrate the results without negotiating an agreement with the British Library.

chirila commented 5 years ago

https://archive.org/details/BLCottonVitelliusCIII - some British library collections are on archive.org which may make extraction easier.