Closed hadro closed 1 year ago
Aha, the URL our v3 code generates is the following: https://ia802803.us.archive.org/BookReader/BookReaderJSIA.php?id=rbmsbk_ap2-v4_2001_V55N4&itemPath=/7/items/rbmsbk_ap2-v4_2001_V55N4&server=ia802803.us.archive.org&format=jsonp&subPrefix=rbmsbk_ap2-v4_2001_V55N4
Which is very subtly different from the URL that IA itself uses in their BookReader:
Notably, in the subPrefix
the value is "rbms.bk_ap2.v4_2001_V55N4", not "rbmsbk_ap2-v4_2001_V55N4" (note the dots).
Basically, it seems it's using the filename for the item as entered by the user, and before it's been sanitized (e.g. replacing dots with underscores or hyphens).
The reason we missed this e.g. in issue #87 is that this is probably a less common scenario, in most cases these are probably going to be the same.
Perhaps @mekarpeles could advise on the best/most consistent way to source that correct subPrefix
value from the metadata response?
It looks like you can get it from the metadata PDF original:
{
"name": "rbms.bk_ap2.v4_2001_V55N4.pdf",
"source": "original",
"mtime": "1535834120",
"size": "113410810",
"md5": "39d01d00452e870dd4f5ee9352fc5dd6",
"crc32": "b5e791f3",
"sha1": "43b28b86e69a4558c9395b43986abd5192a2a1db",
"format": "Text PDF"
}
but there are number of other originals:
"name": "rbmsbk_ap2-v4_2001_V55N4_files.xml",
"source": "original",
"name": "rbmsbk_ap2-v4_2001_V55N4_meta.sqlite",
"source": "original",
"name": "rbmsbk_ap2-v4_2001_V55N4_meta.xml",
"source": "original",
Maybe try original URL first. If that fails look for the original file which isn't XML, sqlite and use that prefix...
While testing a collection (Vermont Life Magazine), I'm finding that the items within this collection aren't being generated correctly.
This item should generate a manifest of 116 images:
However, while the manifest linked above gets created, it only has a single image, the cover image of the item.
Then, when trying to reproduce the BookReader data, I get an error: "Couldn't find image stack."
Is there another/different construction of the BookReader data that we're missing that's needed to get the list of images for that item and the others in that collection?
It's also consistent to all the items in that collection -- what is it about the way these were digitized/published that makes them consistently not render correctly when the manifest is requested?