Closed dustymc closed 2 months ago
deleting the existing media
Maybe local storage would be better for that, we should chat. @mkoo we're going to need guidance here.
deleted media IDs
Deleting media just removes them - I won't be able to do anything with them once the object they represent is gone.
That will however break all existing links, which I hope there's a plan to deal with?!
find unused media and delete them
I can delete (if I have the URI or bucket and filename), but it's a fairly painful process. No problem if you've accidentally uploaded a picture of your credit card, but this is meant to be archival storage and this sort of use case has not been considered.
delete them as it seems you've done in this issue
That is not what https://github.com/ArctosDB/arctos/issues/6954 proposes, at all!!!
@sharpphyl Hi Phyllis, trying to understand the request here. It sounds like you are moving digital media so there will be a new URL or path for the media (including thumbnails?!). So is this a simply replacement task? Or is this more extensive? Can you email me a time for a quick chat to clear this up? It will be likely a lot faster than messages back and forth! Thanks
Phyllis can give more details, but basically we are updating the images of some of our marine invert specimens. This includes taking pictures of various views of a specimen that were separate images and putting them together into nice plates. We would like to have those new images be associated with the record to replace the old ones.
OK makes sense -- so @dustymc can DMNS provide you with a CSV for replacing URIs? @genevieve-anderegg @sharpphyl it sounds like you do not want to to delete the media records since the metadata remains unchanged but just need to update the URLs to the your new storage source. And you've been tracking this in a spreadsheet so the next step should be simple
So adjusting the current bulkloader (https://arctos.database.museum/tools/BulkloadMedia.cfm?action=ld) maybe something like this?
Media_id old_media_uri old_preview_uri new_media_uri new_preview_uri
Does that work @dustymc ? ps. still open for a call if I am misunderstanding your situation so please email if I am missing the mark...
Does that work
Yes and no.
It's nice because it preserves links/URIs (most of the reason Media works like it does), if the new carries about the same information as the old (not clear??).
It's horrid because it requires me to delete archival media from an environment that's REALLY not supportive of deletion.
Less-evil might be to just orphan the "old" files?? Someone would still have to pay for the storage, but that's probably trivial. Seems like we need more - documentation? something - going forward, or maybe a more disposable (=cheaper) sort of media storage, or different expectations from one or all of us, or ?????????
Least-evil from my perhaps-very-lost perspective would just be to add, leaving the old (presumably still useful, perhaps used in research and etc., but not as pretty??) media-with-files intact.
In any case, my concerns mostly involve the original title (including the implied bits!), deleting media from (archival!) storage.
An idea of scope might be useful as well - is this 3 blurry pictures or 20 petabytes of data and a brazillion records or ?? - somehow the ends of that spectrum feel different.
just be to add, leaving the old (presumably still useful, perhaps used in research and etc., but not as pretty??) media-with-files intact.
YES!!! We are museums - DO NOT throw that away!!!!
Seems like we need more - documentation? something - going forward, or maybe a more disposable (=cheaper) sort of media storage, or different expectations from one or all of us, or ?????????
Agreed. I'll talk with the others after Thanksgiving, but it would be nice to have these new higher-quality pictures be at the forefront on the record while the older ones are less prominent. Perhaps still maintained, but not linked to the catalog record specifically? For clarification, the old and new media actually have the same exact pictures of the specimens, but the new media compiles them together in a standardized plate that is our preferred standard.
We are replacing media of several types. Most commly, as Genna described, we are consolidating multiple images into one plate so that the dorsal and ventral (and sometimes lateral) views are shown together. Here is an example of this.
Before
After
We are also adding (correct) species, DMNS copyright, catalog number and other useful data to the images.
Every time I delete an image, I get this warning: -you deleted media 10588215- The files https://web.corral.tacc.utexas.edu/arctos-s3/sharpphyl/2018-11-20/ZC_29288_Boonea_impressa_ventral.jpg and https://web.corral.tacc.utexas.edu/arctos-s3/sharpphyl/2018-11-20/tn/tn_ZC_29288_Boonea_impressa_ventral.jpg are unaffected. You should delete them if you don't need them around anymore.
Sure enough, those URLs are still active, just not linked to the catalog record. If I could replace one of the URLs with the new image I could reduce the orphan images somewhat. The documentation indicates I cannot do that.
I'm keeping a cvs that lists the media identifier (e.g. 10588215 above) and the URLs. We've replaced hundreds of images and have hundreds more to go.
@Jegelewicz There is nothing in the deleted images that hasn't been retained in the replacement. There is no value in archiving them or keeping them around unless it's just cheaper and easier to do that. If so, should the warning be modified? I'm happy to delete the URLs but can find no way to do that and Dusty's comment ("it's a fairly painful process") is a bit ominous.
Hope this explains why I asked the initial question about deleting media URLs.
@sharpphyl when images are loaded to TACC - they are expected to be "permanent". I don't see any reason for you guys to waste a bunch of time removing images from catalog records. Just add the new images! TACC's mission is to make data available - deleting things violates that mission. If anyone has used the url for any of the media you want to "delete" they will get 404 errors - not what we want! Just add new images and save yourself a lot of time and ensure that any usage of those "old" images remains linked and available for future users.
@Jegelewicz Thanks for your suggestion to keep "old" images, but we are striving to have just one (good quality) image of each specimen associated with a taxon or a catalog record.
Consolidators often show only one image and we want that to be the accurate image with the complete views of the specimen and complete data.
Retaining a linked image with a misspelled or invalid taxon name is confusing and reflects poorly on the quality of DMNS data.
If there were a way to encumber inadequate media, when we replace it, so it is not visible to the public, not sent to consolidators, and the URL cannot be opened, then I would leave it with the record. I don't see the option today, so I'll continue to delete the inadequate media. If there's no reason to delete unlinked media, I'll stop collecting the media IDs and URLs, but it would be nice to have the "delete media warning" updated if that is the case.
I wonder if it would be possible/useful to have some kind of rank added to media so that the best images show up at the top?
rank
That's come up a few times but never made it to actionable. https://github.com/ArctosDB/arctos/issues/2813 might be most-relevant.
I don't think there are any action items here, tabling.
@dustymc We are upgrading much of our media over the next few months which entails deleting the existing media and replacing it. I'm keeping a csv of all the deleted media IDs including thumbnails. I haven't been able to delete them myself (although Arctos tells me I should). Do I periodically send the csv to you through GitHub or can you find unused media and delete them as it seems you've done in this issue?
Originally posted by @sharpphyl in https://github.com/ArctosDB/arctos/issues/6954#issuecomment-1805859624