ClevelandMuseumArt / openaccess

Creative Commons Zero v1.0 Universal
74 stars 9 forks source link

APi returning non existant images leading to 404's #8

Closed anglerfish27 closed 1 year ago

anglerfish27 commented 1 year ago

Hello I am using Adafruit's Titano hardware to access the API for images based on a project the company made. Details at: https://learn.adafruit.com/cleveland-museum-of-art-pyportal-frame

This issue that is being encountered is that after some time ( usually less than 30 minutes sometimes longer sometimes way shorter) the API returns back a non existant image. When this happens the Adafruit code doesn't catch it correctly and results in the code trying to process the non existent(404) image to make it fit the Titano using Adafruit's online IoT services and then it ultimately fails freezing the program until a reboot is performed manually.

Adafruit is working on their code to better catch this: https://github.com/adafruit/Adafruit_CircuitPython_PyPortal/issues/128

but I feel that the CMA API owners should also be working on this to ensure it does not return images that do not exist killing the program.

All of Adafruit's code uses CircuitPython and is open source. I'm on the latest build 8.2.6 you can freely see the files and functions it uses. Perhaps that will help you figure out what needs to be edited on your end as well. The forum where this is being discussed with members of Adafruit's core programming staff (on git hub) are trying to work to help resolve this so it has the most "important" people at Adafruit on it. I'm hoping that CMA can help out with their API as well. Thank you kindly, Anglerfish27.

https://forums.adafruit.com/viewtopic.php?t=204689

JCACMA commented 1 year ago

Thank you for using our Open Access collection and giving CMA credit for its Open Access policy. Also, thank you for bringing this to our attention. We are currently working on our API and will look at the issue you describe and get back to you.

I do want to note that your website needs to remove our CMA logo. @.***

https://www.clevelandart.org/open-access-faqs

What about Citations? https://www.clevelandart.org/open-access-faqs Works designated as Creative Commons (CC0) do not require attribution or citation. Copyrighted content, content with proprietary rights, or that is otherwise restricted for limited non-commercial, educational, and personal uses should be cited including the URL "www.clevelandart.orghttp://www.clevelandart.org/" in addition to all copyright and other proprietary notices contained on the materials. Citation of the CMA’s CC0 or restricted content does not imply endorsement by the CMA, nor does it grant permission to use the CMA’s trademarks without prior approval.    You may wish to cite images and data from the CMA's collections for educational and scholarly, or other publication purposes. Consult the CMA collection online or metadata associated with an object via the application programming interface http://openaccess-api.clevelandart.org/ (API) or data from the GitHub Repositoryhttp://openaccess-api.clevelandart.org/ for information that can be used for citations.  

I would suggest offering a link to the actual Open Access page: https://www.clevelandart.org/open-access

We will get back to you regarding the missing image issue – stay tune

Jane

Jane Alexander T 216-707-2644 From: anglerfish27 @.> Sent: Monday, October 2, 2023 12:26 PM To: ClevelandMuseumArt/openaccess @.> Cc: Subscribed @.***> Subject: [ClevelandMuseumArt/openaccess] APi returning non existant images leading to 404's (Issue #8)

Hello I am using Adafruit's Titano hardware to access the API for images based on a project the company made. Details at: https://learn.adafruit.com/cleveland-museum-of-art-pyportal-frame

This issue that is being encountered is that after some time ( usually less than 30 minutes sometimes longer sometimes way shorter) the API returns back a non existant image. When this happens the Adafruit code doesn't catch it correctly and results in the code trying to process the non existent(404) image to make it fit the Titano using Adafruit's online IoT services and then it ultimately fails freezing the program until a reboot is performed manually.

Adafruit is working on their code to better catch this: adafruit/Adafruit_CircuitPython_PyPortal#128https://github.com/adafruit/Adafruit_CircuitPython_PyPortal/issues/128

but I feel that the CMA API owners should also be working on this to ensure it does not return images that do not exist killing the program.

All of Adafruit's code uses CircuitPython and is open source. I'm on the latest build 8.2.6 you can freely see the files and functions it uses. Perhaps that will help you figure out what needs to be edited on your end as well. The forum where this is being discussed with members of Adafruit's core programming staff (on git hub) are trying to work to help resolve this so it has the most "important" people at Adafruit on it. I'm hoping that CMA can help out with their API as well. Thank you kindly, Anglerfish27.

https://forums.adafruit.com/viewtopic.php?t=204689

— Reply to this email directly, view it on GitHubhttps://github.com/ClevelandMuseumArt/openaccess/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAQIFIG2U2Z2EAACJ6HP3WDX5LTKRAVCNFSM6AAAAAA5PVVV66VHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDEMRUGIYTKNQ. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

anglerfish27 commented 1 year ago

Thanks, I am not part of Adafruit, just a customer. I do not represent Adafruit on any level (other than they take my money for neat stuff all the time :) ). I can share that information in my forum post so hopefully they see it. Otherwise you may need to contact them directly.

Thank you for the prompt response! Anglerfish27

JCACMA commented 1 year ago

Thank you. We would appreciate you posting in the forum --especially since they do take money 😊

Jane Alexander T 216-707-2644 From: anglerfish27 @.> Sent: Monday, October 2, 2023 12:40 PM To: ClevelandMuseumArt/openaccess @.> Cc: Jane Alexander @.>; Comment @.> Subject: Re: [ClevelandMuseumArt/openaccess] APi returning non existant images leading to 404's (Issue #8)

Thanks, I am not part of Adafruit, just a customer. I do not represent Adafruit on any level (other than they take my money for neat stuff all the time :) ). I can share that information in my forum post so hopefully they see it. Otherwise you may need to contact them directly.

Thank you for the prompt response! Anglerfish27

— Reply to this email directly, view it on GitHubhttps://github.com/ClevelandMuseumArt/openaccess/issues/8#issuecomment-1743354957, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAQIFIGC22NTWUGAHWQPNCLX5LU7XAVCNFSM6AAAAAA5PVVV66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGM2TIOJVG4. You are receiving this because you commented.Message ID: @.**@.>>

ethanholda commented 1 year ago

@anglerfish27 Can you give me a few URL examples of images you are having difficulty with?

anglerfish27 commented 1 year ago

https://openaccess-cdn.clevelandart.org/1938.301.57.b/1938.301.57.b_web.jpg

anglerfish27 commented 1 year ago

I was running the screen testing and I hit one just now. So the program reads this json to get the image: https://openaccess-api.clevelandart.org/api/artworks?cc0=1&has_image=1&indent=2&limit=1&type=Painting&skip=1827

the last 4 numbers are randomly generated by the display code out of 3223 files for paintings. I see in the URL parsed from the code for this image it is: https://openaccess-cdn.clevelandart.org/1932.119.54.a/1932.119.54.a_web.jpg

which is right per the json (not always the case..) but that gives you a 404 if you click on that link. All the others that display properly do NOT give me a 404 when I click on them. (this is in my code logs). So I know that a valid image/link is accessible by clicking on it from my logs. But if its a 404 the whole program comes crashing down. I downloaded a copy of your entire json from github just now I'm going to poke and see if this particular image shows the same information and if it also fails to load with a 404.

Thanks! Hope this helps.. Anglerfish27

anglerfish27 commented 1 year ago

Yep, as suspected, in your data.json file from GitHub that is roughly 293MB in size I did a search for that image. It was found in the json on line 1671311 (at least that's the line # when reading the json using Sublime Text Editor).

If I scroll up or down and pick another image it loads fine. As expected. So at this point I am like 99.9% sure there are entries in your json file as images that dont work for whatever reason (dont exist, different name...something).

Since this json file is so huge I'm trying to think if there is a way to script and parse out every single web image url along with the page number for reference. Then somehow run it through a program to test to see if it exists or not (404 or other).

That could take me a seriously long time to put together as I'm no wizard at programming. But I think it should be clear to your team the issue. While Adafruit is working to help recover from the issue gracefully at the end of the day its bad CMA JSON data.

I have no idea what your IT systems look like if and if there's a way to generate a new json based on whats actually stored as images. No idea how that works. I am in the I.T field myself 20 years as a sysadmin so I'm trying to think of what I would do if I was in front of the servers. Not being savy with JSONs myself or web development in general makes it harder. But I have to think there's some directory structure on a database or file store that has all the images, and a new JSON based on that needs to be created therefore only images that exist will be listed. I'm guessing here..

I'm going to see if I can code something to parse the offline JSON but I have serious doubts how well it'll go :) going to test some of my python skills that I've never used. But hey its a project to tinker on. I would love to know what your team thinks and how you are going about things, out of my own professional (nerd) obscurity. I obviously do not expect to you share any compromising data or infrastructure layouts. Hey do you need a remote IT sys admin?? :)

anglerfish27 commented 1 year ago

Oh your CVS file makes this much easier! Going to dwindle it down to ID and the web.jpg image. Then I'll write a script to run through all the jpgs and give me a list that return a 404 which I can line up with the ID.

ethanholda commented 1 year ago

@anglerfish27 I ran some code to fix the missing images. All works that have artwork photography (which is not all of them) should have valid image URLs. I'm working with the team to make our update process more robust so we don't have this issue in the future. Thanks for your interest in Open Access at CMA.

anglerfish27 commented 1 year ago

Thank you so much! I will give this a try and see how things go. Apologies for the very delayed response. Family emergency :(

anglerfish27 commented 1 year ago

Closing PR in anticipation that testing will go well!

anglerfish27 commented 1 year ago

I'm having new issues now. It wont even process a URL now I dont know if its on Adafruit's side or yours :(

retrieving url: https://openaccess-api.clevelandart.org/api/artworks?cc0=1&has_image=1&indent=2&limit=1&type=Painting&skip=1381

Then I get: Retrieving data...An error occured, retrying! - Sending request failed

that's as far as I get now. The retrieved URL I posted changes as its random by the program on purpose. That works (the URL) I get can to it manually. Something else now must be going on. I am reaching out to the Adafruit team now for assistance. This one has me stumped.

ethanholda commented 1 year ago

If you can get to the URL in a browser (we could), then the issue isn't on our side. Hopefully the Adafruit folks can shed some light.

anglerfish27 commented 11 months ago

Yeah the Adafruit team has been amazing, they are helping chase it down for me. Looks like the WiFi fimware (latest) which is not Adafruit's problem, has a problem that it doesn't like how your ssl certs are being presented. I dont know the details and I shared the link to the Adafruit team in case they want to chime in here for your own sake. The fix sounds like I need to roll back to an older Firmware. So I'm going to be doing that today. Hopefully it works.

anglerfish27 commented 11 months ago

Yep seems to be working on the old wifi firmware. There's nothing "wrong" with your ssl certs its just the method that they use (common) is not properly handled by the wifi driver, this has been known about and now I think I may have trigger some folks to start scratching their head with what is wrong with the current stable wifi firmware.

Algorithm Elliptic Curve Signature Algorithm ECDSA with SHA-384

That's apparently what is not being handled correctly by the WiFi driver. Program has been running great all day minus one 'new' error that we'll chase down, nothing on your end I would not think. You have some amazing artwork!