ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

Help Needed on Media Metadata Bulkload #4252

Closed campmlc closed 2 years ago

campmlc commented 2 years ago

Finally to the point of loading my first media metadata file, and I get the folllowing error:

media_uri could not be validated. | http://data.tacc.utexas.edu/corral/projects/arctos/web/msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project/NK231885A_Catallagia_decipiens_FemaleA_A93U3_40xOV.jpg -- | --

Can someone with sharper eyes than mine figure out how the path in the loaded file is any different from the path below as displayed by TACC via Cyberduck?

Screenshot 2022-01-12 15 39 57

dustymc commented 2 years ago

The "outside" URI starts with https://web.corral.tacc.utexas.edu/arctos/

If that's not somewhere in the upload request docs it should be.

campmlc commented 2 years ago

When I click on "show url" in cyberduck, it gives http not https?

On Wed, Jan 12, 2022, 4:22 PM dustymc @.***> wrote:

  • [EXTERNAL]*

The "outside" URI starts with https://web.corral.tacc.utexas.edu/arctos/

If that's not somewhere in the upload request docs it should be.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/4252#issuecomment-1011541529, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBHRNYGYDTU2PJB2DJLUVYEKBANCNFSM5L2FURBA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

campmlc commented 2 years ago

Why is the url that I see when logged into tacc so different?

On Wed, Jan 12, 2022, 4:31 PM Mariel Campbell @.***> wrote:

When I click on "show url" in cyberduck, it gives http not https?

On Wed, Jan 12, 2022, 4:22 PM dustymc @.***> wrote:

  • [EXTERNAL]*

The "outside" URI starts with https://web.corral.tacc.utexas.edu/arctos/

If that's not somewhere in the upload request docs it should be.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/4252#issuecomment-1011541529, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBHRNYGYDTU2PJB2DJLUVYEKBANCNFSM5L2FURBA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

dustymc commented 2 years ago

Why is the url that I see when logged into tacc so different?

Your tool is making up random things.

campmlc commented 2 years ago

Cyberduck is the tool TACC recommended. Surely it can't invent a directory structure.

On Wed, Jan 12, 2022, 4:39 PM dustymc @.***> wrote:

  • [EXTERNAL]*

Why is the url that I see when logged into tacc so different?

Your tool is making up random things.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/4252#issuecomment-1011551461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBACEBISHFUWQU3CZZ3UVYGL5ANCNFSM5L2FURBA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

dustymc commented 2 years ago

Well it did....

EDIT: no, it didn't invent a directory structure, but it did infer a URI from one.

campmlc commented 2 years ago

I am thoroughly confused. Where am I given the information to use https://web.corral.tacc.utexas.edu/arctos/? The How To at https://handbook.arctosdb.org/how_to/How-to-Upload-Media-to-TACC.html gives the following: Screenshot 2022-01-12 17 46 32

Note this information matches the directory structure I have obtained when logged into TACC and what I used for my bulkload file.

campmlc commented 2 years ago

What I see logged in to TACC to upload media to the msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project directory. Screenshot 2022-01-12 17 57 26

dustymc commented 2 years ago

Where am I given the information to use https://web.corral.tacc.utexas.edu/arctos/?

See https://github.com/ArctosDB/arctos/issues/4252#issuecomment-1011541529, I'm sure (ish, maybe it was an issue??) that was in there somewhere, if it's not now then it should be returned or added.

directory structure

You (and cyberduck) are confusing directory structure and URLs. They're not the same; this isn't "a box" it's part of a more sophisticated system. (The directory structure is all lies too, but it's low-level enough that you can safely forget that you now know that.)

msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project (and children) will be consistent internally and externally, and it's all made by you (or by me at your request) - you have complete control of that.

Everything before that is "part of the system" - we can't control nor change it, but it should be static.

In cyberduckland, the stuff before that will be

/data.tacc.utexas.edu/corral/projects/arctos/web/

(And that might look like it starts with some protocol - http or ftp or scp or whatever - in various tools, that's also lies, pretend its a file system.)

In internet land, /data.tacc.utexas.edu/corral/projects/arctos/web/ becomes/maps to https://web.corral.tacc.utexas.edu/arctos/

/data.tacc.utexas.edu/corral/projects/arctos/web/msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project/NK231885A_Catallagia_decipiens_FemaleA_A93U3_40xOV.jpg is "internal" (and you can only see if it you're somehow accessing "the server").

https://web.corral.tacc.utexas.edu/arctos/msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project/NK231885A_Catallagia_decipiens_FemaleA_A93U3_40xOV.jpg is "external" (so anyone with a browser can see it).

Alles klar?

Jegelewicz commented 2 years ago

Herr Kommissar?

campmlc commented 2 years ago

Clear as mud, but I guess I get to make mudpies. I changed the directory and will see if it loads. If I can get a chance to re-teach myself how to edit the documentation I will make it much, much more clear that this is the prefix for the URL. I don't want anyone else to have to go through what I just spent the entire day trying to do.

campmlc commented 2 years ago

So yay, it loaded. But I didn't realize that the image thumbnails are labeled with the description field rather than the title field. I put the info I wanted to show up with the image in title, so can I re-bulkload media metadata with that info in description instead? Will a re-bulkload overwrite the existing metadata?

campmlc commented 2 years ago

I guess not?

Bulkload Media Metadata (Version: 1.4)

  | An error occurred while processing this page!Message: Error invoking external processDetail: psql:/usr/local/webroot/temp/excopy_campmlc_20220113050143563_520.sql:204: ERROR: null value in column "mime_type" violates not-null constraint DETAIL: Failing row contains (66550, campmlc, 2022-01-12 22:59:02.798975, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null). CONTEXT: COPY cf_temp_media, line 3: ",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Check the Arctos Handbook for more information on errors.This message has been logged as 0BC6B59F-6663-44A0-AF7D4D8985B7B306 Please contact us with any information that might help us to resolve this problem. For best results, include the error and a detail description of how it came to occur in the Issue. -- | --

Actually, I fixed the above - caused by, of course, blank rows at the end of the file. But then I get this: https://web.corral.tacc.utexas.edu/arctos/msb/mamm/TPT/Siphonaptera/Malpais_Flea_Project/NK290454E_Anomiopsyllus_sp._FemaleB_A93O8_100xAB_converted.jpg already exists as media_id 10681919

campmlc commented 2 years ago

Here is the corrected info I'd like loaded to replace existing metadata, with relevant info consolidated into the description field. DEP Malpais Flea Metadata bulkload corrected_2.zip

campmlc commented 2 years ago

If not possible to replace with a second bulkload file, since the media now are linked in Arctos, I'd like to replace the description field with the above data - this appears to be possible as a Media Label Bulkload, but since there is already a description label attached, I don't know whether Arctos will add a second description label or replace the first if I load it? @dustymc I don't know how to obtain the media IDs in bulk in order to create the label bulkload file. Media label search is not working, but here is the media search results from the linked project page: https://arctos.database.museum/MediaSearch.cfm?action=search&project_id=10003272

dustymc commented 2 years ago

image thumbnails are labeled with the description field rather than the title field

Hu?

If you mean thumbnails, those are preview_uri (and will need uploaded first if they're not already there).

If you want the filename in the metadata for some reason, media identifier is probably most appropriate (but you wouldn't be mixing it with other stuff if it was important, right??).

I'm not sure either of those make sense so ???

add a second description label or replace the first

It's a loader, it just loads, there's no consideration of what might already be there.

know how to obtain the media IDs

https://handbook.arctosdb.org/how_to/How-to-Bulkload-Media-Metadata#get-media_id-from-uri

I can delete stuff, but need to sort out exactly what you're trying to do first.

Title ("Title for multi-page documents such as fieldnotes.") does need removed, whatever else happens.