cc-archive / open-ledger

Prototype code and examples for work on the Creative Commons "CC Search" project
MIT License
48 stars 23 forks source link

Some Europeana images showing up as broken images #188

Open little-wow opened 7 years ago

little-wow commented 7 years ago

For example here: https://ccsearch.creativecommons.org/image/detail/TaNnY_wrKsmSMWYkFId3Yw== or here: screen shot 2017-03-30 at 3 16 12 pm (which does display after clicking on blank image: https://ccsearch.creativecommons.org/image/detail/TaNnY_wrKsmSMWYkFId3Yw==)

DavidHaskiya commented 7 years ago

Hi, From the Europeana perspective I can see two different issues here. None are really smth we can fully control and reduce the number of missing thumbnails or failing full size images to 0.

  1. That some Europeana items don't have working thumbnails but do have working full size images

Example query: https://ccsearch.creativecommons.org/?search=amsterdam&page=1&search_fields=title&search_fields=creator&search_fields=tags&per_page=50&work_types=cultural&providers=europeana where there are a number of hits with default thumbnauls, but do have working full size images when you click to see them in full.

This is because thumbnails in Europeana are created asynchronously after publication. If you look at the search result above and go to Europeana Collections you will see that some (most!?) of them now do have thumbnails. So if Creative Commons would reharvest Europeana content to CC Search using the same base search as the first harvest most of the currently missing thumbnails would appear.

One can also reduce the amount of missing thumbnails somewhat by adding thumbnail=true to the Europeana API-query that forms the first step of your harvest, like so: http://www.europeana.eu/api/v2/search.json?query=NOT+PROVIDER%3A+%22Rijksmuseum%22+AND+*:*&media=true&thumbnail=true&qf=IMAGE_SIZE%3Alarge&qf=IMAGE_SIZE%3Aextra_large&reusability=open&qf=TYPE%3AIMAGE&profile=facets&rows=0&wskey=yourkey

  1. Some Europeana items have working thumbnails in CC Search, but don't have working full size images

The full size images aren't hosted by Europeana. We hotlink them in from source (just as you do). So we can't control them.

What we can do is that if you find a Europeana partners whose entire image set is basically broken let us know and we'll get in touch with them.

little-wow commented 7 years ago

@robmyers what do you think?

rheaplex commented 7 years ago

Thank you for that thorough explanation @DavidHaskiya .

@little-wow We can run another import / harvest. After that if we do find any broken image sets we can report them. Does that sound OK? :-)

little-wow commented 7 years ago

great, thanks! I'll let the folks from the Europeana comms team know that we're ready to announce as soon as that's done. @robmyers

little-wow commented 7 years ago

@robmyers ping! Just checking in on this ticket! Thank you so much!

rheaplex commented 7 years ago

I won't be doing this until next week as I am still trying to catch up after being out sick. I'll ping you when it's done (and if it becomes any more urgent do re-ping me).

rheaplex commented 7 years ago

I've started the import. I will check its progress in the morning (it will take a while....).

little-wow commented 7 years ago

👍🏽 Thanks Rob! On Tue, Apr 11, 2017 at 9:09 PM Rob Myers notifications@github.com wrote:

I've started the import. I will check its progress in the morning (it will take a while....).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/creativecommons/open-ledger/issues/188#issuecomment-293443643, or mute the thread https://github.com/notifications/unsubscribe-auth/AGw-ToY6Xflvrg-pxgS8qeeCRuLCHm0lks5rvCRLgaJpZM4MuzX7 .

--

Jennie Rose Halperin Communications Manager Creative Commons Invest in an open future. Support Creative Commons today: http://bit.ly/19IjSKl

rheaplex commented 7 years ago

OK I'm now very confused. :-)

Here is the search from above:

https://ccsearch.creativecommons.org/?search=amsterdam&page=1&search_fields=title&search_fields=creator&search_fields=tags&per_page=50&work_types=cultural&providers=europeana

And here is the first image that doesn't have a thumbnail in the results for that:

image

Which links to this results page:

https://ccsearch.creativecommons.org/image/detail/JTd80iTV1J0IqJ_P43bO7g==

image

Note how the skull looks, and that the title matches that from the search results preview.

When I click on the link it goes to this page:

http://www.europeana.eu/portal/en/record/2021668/naturalis_specimen_ZMA_MAM_7523.html?utm_source=api&utm_medium=api&utm_campaign=fjqyYDAFX

image

Note that the skull is different but that the title is the same.

The image on the page on Europeana does have a thumbnail available, the one in ccSearch doesn't.

I'm concerned about the different images.

@DavidHaskiya do you have any idea what we might be doing wrong? :-)

DavidHaskiya commented 7 years ago

I'm confused as well - as in not sure this is an issue with our API or your implementation - let's see if our joint confusion can lead to some insights!

When I query our API for http://www.europeana.eu/portal/en/record/2021668/naturalis_specimen_ZMA_MAM_7523.html by searching on its dc:identifier, like so

http://www.europeana.eu/api/v2/search.json?wskey=apikeyhere&query=proxy_dc_identifier:%22ZMA.MAM.7523_0907090558%22&profile=rich&rows=98

I get this thumbnail in return: "http://europeanastatic.eu/api/image?uri=http%3A%2F%2Fmedialib.naturalis.nl%2Ffile%2Fid%2FZMA.MAM.7523_pal%2Fformat%2Flarge&size=LARGE&type=IMAGE"

which does resolve correctly in my browser.

What's the URI of the offending thumbnail in your search display? I'd try to check but can't seem to get the same first hit as you do.

Other example: So I did look at another one that has a broken thumbnail in CC search and found the following pattern:

CC-search thumbnail link which does not resolve and so gets a default thumbnail: https://www.europeana.eu/api/v2/thumbnail-by-url.json?size=w200&type=IMAGE&uri=http%3A%2F%2Fmedialib.naturalis.nl%2Ffile%2Fid%2FZMA.MAM.7252_lat%2Fformat%2Flarge

Working 200px thumbnail called in Europeana Collections via the Europeana API:

http://www.europeana.eu/api/v2/search.json?wskey=apikeyhere &query=proxy_dc_identifier:%22ZMA.MAM.7252_01216961143%22&profile=rich&rows=98 which includes this thumbnail URI http://europeanastatic.eu/api/image?uri=http%3A%2F%2Fmedialib.naturalis.nl%2Ffile%2Fid%2FZMA.MAM.7252_pal%2Fformat%2Flarge&size=LARGE&type=IMAGE

Does this help in your troubleshooting?