cc-archive / cccatalog-api

[PROJECT TRANSFERRED] The Creative Commons Catalog API allows programmatic access to search for CC-licensed and public domain digital media.
https://github.com/WordPress/openverse-api
MIT License
100 stars 122 forks source link

MET returns incorrect response code and content type for all images #197

Closed aldenstpage closed 5 years ago

aldenstpage commented 5 years ago
$ curl -i http://images.metmuseum.org/crdimages/ap/web-large/246958.jpg
HTTP/1.1 200 OK
Content-Type: text/html
...

That should return a 404 Not Found or 301 Moved Permanently, not 200 OK. If you open the above link in your browser, you can see that the image is gone. The server can't delete the image preemptively when an incorrect status code is returned from the provider.

~I think that this type of issue could be averted by deleting all responses of type text/html during image validation.~ All MET images are marked as text/html

Succeeds creativecommons/cccatalog-frontend#140

aldenstpage commented 5 years ago

It seems like the Content-Type header is set to text/html for all images, not just 404s. This breaks the image proxy.

aldenstpage commented 5 years ago

It came down to problems caused by a very aggressive Incapsula policy. The MET fixed this after we contacted them.