Open jvolker opened 4 years ago
I've written a script that checks and logs the availability of the artwork and thumbnail URLs. It also gives some statistics. It could easily be extended to push the results back to the API once the API is prepared for this: https://github.com/jvolker/Openframe-ArtworkAvailabilityChecker
It's not as bad as I thought. Only 31 artworks are unavailable. Even though available artwork could still fail for other reasons (like wrong file type) while loading. What I find really interesting is that 65% of all available artworks are shaders. We've also got some duplicate artwork URLs.
Found 324 public artworks.
57841827c0006da8310e8e69 Invalid URL meetar.github.io/terrain-tour
56e4dc3f44973147579abbef ENOTFOUND server can not be found
57041cdee1f87ce61cc0af07 ENOTFOUND server can not be found
572114b0c2cb33000be1eea2 ENOTFOUND server can not be found
570c0c7c708dc5311ca1e4c8 404 Not found
570f716e507bfb8922c89814 403 Forbidden
56e0c15e17bbab454407c2b7 404 Not found
57a85d5bc0006da8310e906b 404 Not found
576ae072c0006da8310e8b89 404 Not found
57640ff1c0006da8310e8a94 404 Not found
57640f99c0006da8310e8a90 404 Not found
59711251b2462d7382b8f530 404 Not found
5970fe40b2462d7382b8f52f 404 Not found
597219fbb2462d7382b8f53f 404 Not found
5972196db2462d7382b8f53e 404 Not found
59711293b2462d7382b8f534 404 Not found
588d38697cb7f28d67893075 404 Not found
5a7876c3b2462d7382b8fa63 403 Forbidden
576994d0c0006da8310e8b6c 404 Not found
5af67553a38167076035b505 403 Forbidden
5cd6da5aa38167076035bc00 403 Forbidden
5af67510a38167076035b504 403 Forbidden
5af67576a38167076035b506 403 Forbidden
56e5c923a6b560d606184662 404 Not found
57846752c0006da8310e8e89 404 Not found
5b3d1c31a38167076035b5d6 403 Forbidden
584d65957cb7f28d67892d8e 403 Forbidden
5896d4c6a9c1b11803b240fb 404 Not found
59b615adb2462d7382b8f5de 404 Not found
59b615d1b2462d7382b8f5df 404 Not found
59b61573b2462d7382b8f5dd 404 Not found
293/324 artworks available
31/324 artworks unavailable
283/324 thumbnails available
Artwork type counts:
openframe-glslviewer: 189
openframe-image: 82
openframe-website: 13
openframe-video: 5
openframe-of: 2
openframe-processing: 2
Duplicate artwork URLs:
https://thebookofshaders.com/log/160306213426.frag: 2
https://goo.gl/images/Y1weHm: 2
https://goo.gl/images/j9YfL2: 2
https://thebookofshaders.com/log/160414134236.frag: 2
Nice, this is great! Yeah, since shaders were one of our primary use cases and we had a good collaboration with Patricio from The Book of Shaders (including importing directly to openframe from BoS), it's not surprising to me that such a large proportion are shaders.
Should be a relatively easy next step to remove unavailable artworks from the stream — I think we should be able to just use the JS client to update the unavailable artworks, setting is_public
to false. This way no artworks are deleted unexpectedly from a user's account, but they won't show up in the stream. It would have to be run by a super user that can modify all artworks, which I can do. Does that make sense?
including importing directly to openframe from BoS), it's not surprising to me that such a large proportion are shaders.
That's what I thought. Shows what potential a web clipper could have.
setting is_public to false. This way no artworks are deleted unexpectedly from a user's account, but they won't show up in the stream. It would have to be run by a super user that can modify all artworks, which I can do.
I was thinking of doing it more thoroughly, but it might be over the top:
artwork_url_last_checked
and artwork_url_is_available
as well as thumbnail_url_last_checked
and thumbnail_url_is_available
is_public
directly wouldn't allow keeping track of those temporarily offline ones.Is there some sort of email template available in the web app or server that could be reused or should this script have a separate one?
@jmwohl Thanks for fixing the login issue in the JSclient so quickly.
I've updated the script a little more. Uncommenting line 92 updateDatabase(artwork, false)
should update the artwork in the database and set is_public
to false
. Be aware, though. It's untested yet since I don't have a test database or superuser rights!
Let me know if you instead like to go the more thorough way I've described above.
Finally finding a bit of time to get back to this. Running with a superuser account didn't work quite as I'd hoped and rather than mess around with it I used the output list of unavailable artwork IDs to run an update against mongo directly, making these artworks private and thus removing them from the stream.
There are still a number of artworks with missing preview images, but I'm not sure we should automatically remove those.
As you've suggested, we could automate this process so that the stream remains clean. I'd like to get a local server env running in order to test stuff on but haven't had time to set that up yet — I'd really like to go through and update dependencies as part of that, but that might be too ambitious.
Only artworks that are online should be displayed. And in order to encourage users to add preview images the ones without preview image should show up only after the ones with a preview image.
This topic has been discussed here: https://groups.google.com/forum/#!topic/openframeio/4OeIVWsowHA
@jmwohl: