Thumbnail saving? - Githubissues

agpar commented 8 years ago

I'm interested if you think it would be worth it to save thumbnails and serve them ourselves. It's pretty underwhelming (the steak lacks serious sizzle) when every thumbnail on the site takes 5-15 seconds to load (if they load).

Pros:

Thumbnails will load quickly off our sweet CC server. Thumbnails can be saved at a low resolution and cached, making our services much faster than the slow IIIF implementations of some of our providers.
We can always use the suggested thumbnail. Many providers provide a thumbnail which is just a link to a non-iiif service super high res image (3 MB and up). To avoid rendering these, we will select a random image from the default sequence to display as a thumbnail. Often, this leads to very uninteresting thumbnails (blank pages). If we serve thumbnails ourselves, we can take these interesting 3MB thumbnails and convert them to low res. Interesting and light weight thumbnails.
Saves our providers some serious server strain, as we won't be making up to 12 image requests at the exact same moment every time a user visits the home page (even if they never decide to look at any of those documents!). We can send all 12 thumbnails over with the initial response, instead of making 12 simultaneous image requests to the unlucky libraries that make up the bulk of our collection.
The front page will be way more sweet.

Cons:

Have to deal with downloading, converting, and managing thousands of static files ( this is a pain).
Higher memory, storage and bandwidth usage.
Will take a while to implement, and probably quite a while to download the 67 thousand images to represent the current collection.

Do you think its worth it?

ahankinson commented 8 years ago

I think the answer is to get those services to use IIIF thumbnails. Could you make a note of them on the problems document?

https://docs.google.com/document/d/1Lyg7FzBeTHBeC0qbPmFzJfyxpK9ySBKyoiYs0TQkCIs/edit#

In general, I'm hesitant to do this. For DIAMM, for example, if a source does not have a pre-defined thumbnail, I choose a random one from the available page images for the manifest. So the thumbnail actually changes from load to load.

Why not go halfway and set up a memcached caching layer, with the thumbnails cached by URL key. When a thumbnail is requested we can check the cache for it and serve it quickly; otherwise, we fetch and then cache.

agpar commented 8 years ago

The problem with only caching is that we display a random .01% of the thumbnails every time the user loads the front page.

We could re-write how it chooses manifests to display on the front page so it only updates them nightly or weekly, thus making a cache system useful.

I like being able to refresh the page and see a whole new set of interesting stuff though.

ahankinson commented 8 years ago

Good point, but I think choosing a solution to this should be independent of the front-end features.

I'm only saying this because I'm finding that when I present the front page it's a bit strange to explain the stuff that is there. They're not really "Selected items" since we didn't select them, and saying "Random items" just seems a bit too indeterminate. So I'm wondering if that space would be better used for something else (browse by provider? Not sure...) That's a separate issue, but since @jeromepl is wanting a caching layer as well, it might make sense to provide a general solution.

agpar commented 8 years ago

I'm also think of search result thumbnails... We only want to show, like, 30kb of image, but they are often extremely slow to load, since we are typically asking for a dozen of them all at the exact same time from the same server.

I agree, explaining why the front page is half empty seriously sucks.

I think writing a system to retrieve, convert, save, and make thumbnails available would be a very robust long term solution. We could do whatever we wanted with thumbnails if it was in place.

Caching would be helpful to this solution, but would probably not be sufficient alone. If every thumbnail is 100kb, then we already have 7 gb or so that need to be cached. More realistic to keep these on an HD than to have a caching system that will need to be warmed up every time we update (could take hours, given that each thumbnail is a GET).

agpar commented 8 years ago

BTW: we do a similar thing in musiclibs with regards to thumbnails. If one is not provided, or if we know that library uses large images as thumbnails, we just grab a random one from the default sequence. It's still very slow.

ahankinson commented 8 years ago

The problem is that the 'magic' of the site is that we're not serving other people's content, though. Plus it might get us into a bit of a sticky situation WRT copyright if we're storing and re-hosting content.

agpar commented 8 years ago

Yeah, I want the magic to be there too. I'd really like to make the site browsing experience better though. I wonder if it's something that will just resolve itself as IIIF image service implementations improve?

DDMAL / musiclibs

Thumbnail saving? #114