ArchiveLabs / iiif.archivelab.org

Internet Archive IIIF Image 2.0 Server
GNU General Public License v3.0
30 stars 13 forks source link

"texts" item not working -- "Couldn't find image stack." ? #120

Closed hadro closed 1 year ago

hadro commented 1 year ago

While testing a collection (Vermont Life Magazine), I'm finding that the items within this collection aren't being generated correctly.

This item should generate a manifest of 116 images:

However, while the manifest linked above gets created, it only has a single image, the cover image of the item.

Then, when trying to reproduce the BookReader data, I get an error: "Couldn't find image stack."

Is there another/different construction of the BookReader data that we're missing that's needed to get the list of images for that item and the others in that collection?

It's also consistent to all the items in that collection -- what is it about the way these were digitized/published that makes them consistently not render correctly when the manifest is requested?

hadro commented 1 year ago

Aha, the URL our v3 code generates is the following: https://ia802803.us.archive.org/BookReader/BookReaderJSIA.php?id=rbmsbk_ap2-v4_2001_V55N4&itemPath=/7/items/rbmsbk_ap2-v4_2001_V55N4&server=ia802803.us.archive.org&format=jsonp&subPrefix=rbmsbk_ap2-v4_2001_V55N4

Which is very subtly different from the URL that IA itself uses in their BookReader:

https://ia802803.us.archive.org/BookReader/BookReaderJSIA.php?id=rbmsbk_ap2-v4_2001_V55N4&itemPath=/7/items/rbmsbk_ap2-v4_2001_V55N4&server=ia802803.us.archive.org&format=jsonp&subPrefix=rbms.bk_ap2.v4_2001_V55N4

Notably, in the subPrefix the value is "rbms.bk_ap2.v4_2001_V55N4", not "rbmsbk_ap2-v4_2001_V55N4" (note the dots).

Basically, it seems it's using the filename for the item as entered by the user, and before it's been sanitized (e.g. replacing dots with underscores or hyphens).

The reason we missed this e.g. in issue #87 is that this is probably a less common scenario, in most cases these are probably going to be the same.

Perhaps @mekarpeles could advise on the best/most consistent way to source that correct subPrefix value from the metadata response?

glenrobson commented 1 year ago

It looks like you can get it from the metadata PDF original:

{

          "name": "rbms.bk_ap2.v4_2001_V55N4.pdf",
          "source": "original",
          "mtime": "1535834120",
          "size": "113410810",
          "md5": "39d01d00452e870dd4f5ee9352fc5dd6",
          "crc32": "b5e791f3",
          "sha1": "43b28b86e69a4558c9395b43986abd5192a2a1db",
          "format": "Text PDF"
}

but there are number of other originals:

      "name": "rbmsbk_ap2-v4_2001_V55N4_files.xml",
      "source": "original",

      "name": "rbmsbk_ap2-v4_2001_V55N4_meta.sqlite",
      "source": "original",

      "name": "rbmsbk_ap2-v4_2001_V55N4_meta.xml",
      "source": "original",

Maybe try original URL first. If that fails look for the original file which isn't XML, sqlite and use that prefix...

glenrobson commented 1 year ago

Also look at: https://github.com/benwbrum/fromthepage/blob/development/app/models/ia_work.rb#L38

glenrobson commented 1 year ago

Fixed in: https://github.com/ArchiveLabs/iiif.archivelab.org/pull/122