Open saracarl opened 1 month ago
We have our direction for the short-term patch
Ideas for the long-term patch:
Regardless of however much we import:
We also need the import all feature as part of this. I wonder if we should gather all the pages and recurse on each page to separate into 100 item sets.
So what's the implications of 8500 rake tasks at once?
Well, performance hits of course. That is why we need active_job so that we can queue jobs instead.
I think, as initial feature, we should always recurse to get all page though
This script will pull everything from a paginated manifest into a single manifest.
Note that the 85 fetches to OCLC take a few minutes, so if we productize this, it should be backgrounded and not run in a browser request.
uri = 'https://cdm15138.contentdm.oclc.org/iiif/2/p15138coll54/manifest.json'
raw_json = URI.open(uri).read
hash = JSON.parse(raw_json)
manifests=[]
page_uri = hash['first']
while page_uri
p page_uri
raw_json = URI.open(page_uri).read
hash = JSON.parse(raw_json)
manifests += hash['manifests']
page_uri = hash['next']
end
uri = 'https://cdm15138.contentdm.oclc.org/iiif/2/p15138coll54/manifest.json'
raw_json = URI.open(uri).read
hash = JSON.parse(raw_json)
hash['label']="FromThePage consolidated Tennessee Death Records"
hash['manifests']=manifests
hash.delete('first')
f=File.open("/tmp/big_oclc_manifest.json", 'w+')
f.print(hash.to_json)
f.close
When TN imported the following collection: https://cdm15138.contentdm.oclc.org/iiif/p15138coll54/manifest.json
They only got around 450 items. We would have expected the first page of items -- 1000 of them.
We need to reproduce and see why they didn't get the full first page.