davanstrien / IIIF-ML-experiments

1 stars 1 forks source link

Getting Manifest from Europeana #1

Open glenrobson opened 3 years ago

glenrobson commented 3 years ago

First experiment is not going well... Europeana have a SPARQL end point and I should be able to run the following sparql to retrieve all IIIF images and manifests that contain "Pho" in the dc:type:

PREFIX dcterms: <http://purl.org/dc/terms/> 
PREFIX svcs: <http://rdfs.org/sioc/services#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

select ?aggregation ?manifest ?iiifImage ?type where {
     ?iiifImage dcterms:conformsTo <http://iiif.io/api/image> .
     ?webResource svcs:has_service ?iiifImage .
     optional {?webResource dcterms:isReferencedBy ?manifest . }
     ?aggregation edm:isShownBy ?webResource .
     ?aggregation edm:aggregatedCHO ?cho .
     ?metadata ore:proxyFor ?cho .
     ?metadata dc:type ?type .

     FILTER regex(?type, "pho.*", "i") 
} LIMIT 100

Unforunately this only returns two results. Both which contain Photographie in the dc:type but the search interface shows more.

It could be that the rdf database hasn't been updated since 2017...

glenrobson commented 3 years ago

It looks like we can get access to the records using the search api:

https://api.europeana.eu/api/v2/search.json?profile=standard&query=provider_aggregation_edm_isShownBy:*iiif*&theme=photography,&rows=12&start=1&wskey=API_KEY

(note you have to register for an API key to get access to these two interfaces). Its a solr like json response and you can use the id to get to the EDM record:

"id": "/232/https___digitalcollections_jtsa_edu_islandora_object_jts_3A18709_datastream_TN_view_Portrait_20of_20Ishmael_20Aga__jpg",

Which maps to:

https://www.europeana.eu/api/v2/record/232/https___digitalcollections_jtsa_edu_islandora_object_jts_3A18709_datastream_TN_view_Portrait_20of_20Ishmael_20Aga__jpg.json?wskey=API_KEY

and in there you can get the IIIF image URL /object/aggregations/webResources/svcsHasService and manifest /object/aggregations/webResources/dctermsIsReferencedBy. Note will have to check the manifest is a manifest as the example above is a IIIF image url... Not also there are multiple webResources.

glenrobson commented 3 years ago

Stats from the photograph collection:

Total of 209,236 records

Found 7 - country

Found 47 - dataProvider

Found 8 - provider

Found 14 - rights

glenrobson commented 3 years ago

So after the presentation at the IIIF conference Antoine pointed out that we could have got the data from here: https://pro.europeana.eu/page/harvesting-and-downloads#downloads which would have been a lot quicker!

MikeTrizna commented 3 years ago

🤯

aisaac commented 3 years ago

For the record my colleague @Hobbesball has pointed that in January the dump may not have been available as they are. So it was rather an issue of unlucky timing, no regret to have!