TomDeneire / pictor

Discovering IIIF manifests
MIT License
16 stars 4 forks source link

add manifests from SUB Göttingen #2

Closed alexander-winkler closed 1 year ago

alexander-winkler commented 1 year ago
import requests
from lxml import etree

def opac(start:int):
    opac = "http://sru.gbv.de/opac-de-7"
    params = {
        'version' : '1.1',
        'operation' : 'searchRetrieve',
        'query' : 'pica.url = resolver.sub.unigoettingen.depurlppn*',
        'maximumRecords' : '100',
        'recordSchema' : 'picaxml',
        'startRecord' : start
    }
    res = requests.get(opac, params = params)
    tree = etree.fromstring(res.content)
    return tree

hits = opac(1).find('.//{http://www.loc.gov/zing/srw/}numberOfRecords').text
hits = int(hits)

manifests = []
for i in range(1,hits + 1, 100):
    tree = opac(i)
    for ppn in tree.findall('.//{info:srw/schema/5/picaXML-v1.0}datafield[@tag="003@"]/{info:srw/schema/5/picaXML-v1.0}subfield[@code="0"]'):
        manifests.append(f"https://manifests.sub.uni-goettingen.de/iiif/presentation/PPN{ppn.text}/manifest")

with open("subgoettingen.txt", "w") as OUT:
    for m in manifests:
        print(m, file=OUT)

The code assumes that there is a manifest for every digital copy in the SUB online collection. I'm not sure if this is the case.

TomDeneire commented 1 year ago

I've tried this, with some added print statements to see what is happening, but no luck so far, I don't get any response. Will look into it further, but going to call it a night for now ;-)

alexander-winkler commented 1 year ago

This script runs without difficulties on my machine, try: python3.9 goettingen.txt

goettingen.txt

TomDeneire commented 1 year ago

Okay, thanks. Maybe the SRU server was unavailable when I tried yesterday. Works fine now.