UUDigitalHumanitieslab / EDPOP

Creating a virtual research environment (VRE) from CERL resources
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add text datasets to VRE #114

Closed tijmenbaarda closed 5 months ago

tijmenbaarda commented 2 years ago

The EDPOP team want to include a number of full-text datasets into the VRE. (See separate tickets for other types of datasets.) These include:


  1. Dutch Pamphlets Online: https://primarysources.brillonline.com/browse/dutch-pamphlets-online
    • Publisher: Brill
    • Free access: no
    • UU access: no
    • API available: not likely
  2. DBNL: https://www.dbnl.org/
  3. Delpher: https://www.delpher.nl/
  4. Nederlab: https://www.nederlab.nl/
  5. Early English Books online: https://www.proquest.com/eebo/
    • Publisher: ProQuest
    • Free access: no
    • UU access: yes
    • API available: unclear; UU university library says text mining is possible. Probably problematic because access is not free.
  6. Eighteenth Century Texts Online: https://www.gale.com/intl/primary-sources/eighteenth-century-collections-online (access via library)
    • Publisher: GALE Cengage
    • Free access: no
    • UU access: yes
    • API available: unclear, but this dataset is already available in I-Analyzer.
  7. Dutch Songs Online / Nederlandse Liederenbank: http://www.liederenbank.nl/
    • Publisher: Meertens Instituut
    • Free access: yes
    • API available: no information; perhaps data can be downloaded on request.


  1. Bibliothèque Bleue de Troyes: https://artfl-project-uchicago-edu.proxy.library.uu.nl/node/170
    • Publisher: ARTFL Project; digitization by Médiathèque Grand Troyes
    • Free access: yes
    • API available: not mentioned; perhaps downloadable data is offered
  2. MuCEM (Bibliothèque bleue): https://www.mucem.org/programme/exposition-et-temps-forts/bibliotheques-bleues (?)
    • Free access: ?
    • API available: ?
  3. Mazarinades.org: http://mazarinades.org/recherche/
    • Free access: yes
    • API available: ?
  4. Musée de l'Image d'Epinal: https://webmuseo.com/ws/musee-de-l-image/app/report/index.html


NB: shouldn't the collections of ProQuest and GALE go here?

  1. UCSB English Broadside Ballad Archive: https://ebba.english.ucsb.edu/

    • Publisher: University of California Santa Barbara
    • Free access: yes
    • API available: no mention, but project page mentions use of TEI and SQL and various visualizations, so at least a download is probably available on request: https://ebba.english.ucsb.edu/page/tei-xml
  2. Broadside Ballads Online: http://ballads.bodleian.ox.ac.uk/

  3. McGill Library's Chapbook Collection: https://digital.library.mcgill.ca/chapbooks/index.php

    • Publisher: McGill Library
    • Free access: yes
    • API available: no, but TEI files are available for download
  4. Hockliffe Collection: http://hockliffe.dmu.ac.uk/HPall_catalog.html


  1. La Fondazione Barbanera: http://www.bibliotecabarbanera.it/bw5ne5/opac.aspx?web=FNBN&SRC=TIT

    • Publisher: La Fondazione Barbanera
    • Free access: yes
    • API available: does not seem so
  2. Giulio Cesare Groce: http://badigit.comune.bologna.it/GCCroce/index.html

    • Publisher: Biblioteca comunale dell'Archiginnasio, hosted by Biblioteca Universitaria di Bologna
    • Free access: yes
    • API available: use of IIIF, so it should be possible to access both metadata and images using an API. However, I was not able to find out on the website how to access the API: https://bub.unibo.it/it/bub-digitale

NB This is an image databank, not a textual databank.

  1. Raccolta delle Stampe "Achille Bertarelli": https://bertarelli.milanocastello.it/
    • Publisher: Comune di Milano
    • Free access: yes
    • API available: no mention

NB This is an image databank, not a textual databank.


  1. E-rara: https://www.e-rara.ch/wiki/aboutERara
    • Publisher: hosted by ETH-Bibliothek, Zürich
    • Free access: yes
    • API available: yes, using OAI-PMH (XML over HTTP)
    • Website mentions that data is also included in Gallica, but not clear if this concerns all data or only the French-language part


  1. Digitale Bibliothek / Bayerische StaatsBibliothek: https://www.digitale-sammlungen.de/en/
    • Publisher: Müncher DigitalisierungsZentrum
    • Free access: yes
    • API available: yes, with IIIF

NB This is an image databank, not a textual databank.


  1. Spanish Chapbooks: https://cudl.lib.cam.ac.uk/collections/spanishchapbooks/1
    • Publisher: University of Cambridge
    • Free access: yes
    • API available: yes, via IIIF

NB This is an image databank, not a textual databank.

  1. Literatura de cordel y teatro en España (1675-1825): http://pliegos.culturaspopulares.org/busquedaa.php

    • Publisher: private page of Santiago Cortés Hernández
    • Free access: yes
    • API available: no
  2. Fundación Joaquín Díaz: https://funjdiaz.net/biblio0.php

NB This is an image databank, not a textual databank.


  1. Digitized broadside ballads at the City Library of Hämeenlinna: https://lydia.hameenlinna.fi/exhibits/show/kaunokirjallisuus/arkkiveisuja
    • Publisher: City Library of Hämeenlinna
    • Free access: yes
    • API available: no, but this seems to be a very small archive.


  1. European chapbooks: https://search.clevnet.org/client/en_US/cpl-main/search/results?qu=Chapbook&te=
    • Publisher: Cleveland Public Library
    • Free access: yes
    • API available: unclear

I cannot find any textual or image data on this page; only physical items with their metadata. In the library's digital gallery I get a large number of results when searching for 'chapbook' though: https://cplorg.contentdm.oclc.org/digital/search/searchterm/chapbook . IIIF is used, but not clear how to access it. There must be a way though.

  1. Gazettes européennes du 18e siècle: https://www.gazettes18e.fr/
    • Publisher: Institut d'Histoire des Répresentations et des Idées dans les Modernités
    • Free access: yes
    • API available: no information
tijmenbaarda commented 1 year ago

European Chapbooks: we have received a tab delimited file with all metadata, and the confirmation that we can use a OAI-PMH API and that it is available through IIIF.