ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

Feature Request - remove identifier type institutional catalog number #7836

Open dustymc opened 5 months ago

dustymc commented 5 months ago

Is your feature request related to a problem? Please describe.

A new identifier type is being added, an existing type is clearly being mis-used and causing a great deal of confusion, work, and data which cannot do what it is intended to do. From https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2135899234:

From I am unsure of the status of getting rid of https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#institutional_catalog_number as part of this. I thought we had an agreement and plan, I'm no longer sure. My comments in https://github.com/ArctosDB/arctos/issues/7808#issuecomment-2128164455 remain valid: there's clearly a great deal of confusion about that type, it is being used for some things that it should not be used for, keeping it around is just going to prolong the pain and make the new thing that much harder to understand. There are 211796 records without valid Arctos record GUIDs using the type at the moment. 128671 of them do have an issued-by agent. (And most of those don't seem to be institutions, if anyone needs any more evidence that this is an arbitrary mess.) I would like to

  1. for those with an issued-by agent:
    • convert the type to identifier
  2. for those without an issued-by agent:
    • create n 'institutional catalog number' agent,
    • update all null issuedby to use that agent, then
    • convert the type to identifier

Note that (2) is in no way a solution, it doesn't magically make these data make sense, it just moves the point of confusion to a hopefully-less-confusing (and more easily documented) place; the identifiers will continue to carry exactly the same amount of information though this process.

Data: https://docs.google.com/spreadsheets/d/1rNlvoWnDrWeVrxD2nWsMAaFO8x3fmBDQ445sWc8DcGQ/edit#gid=1128266536

@mkoo @Jegelewicz @campmlc please advise and help.

Describe what you're trying to accomplish

Easy access to good data.

Describe the solution you'd like

Remove a nonfunctional point of confusion, https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#institutional_catalog_number

Describe alternatives you've considered

Unnecessary struggle resulting in low-quality data.

Additional context

https://github.com/ArctosDB/arctos/issues/7808

Priority

Nothing has seemed more critical to me in a while...

Jegelewicz commented 5 months ago

@dustymc I assume that data is old because I've had you fix some of those? (ALMNH for example)

Jegelewicz commented 5 months ago

Can we get an updated list?

dustymc commented 5 months ago

Yes the data are stale, I just copied from a comment that was getting buried.

I'll pull fresh data, but possibly not until https://github.com/ArctosDB/arctos/issues/7808 is implemented (it will clear a lot of clutter).

I will change my priorities if there's any realistic way this can be implemented with https://github.com/ArctosDB/arctos/issues/7808, I believe removing what's clearly a point of confusion at that time would be a HUGE benefit, but my current understanding is that this will not be possible.

Jegelewicz commented 5 months ago

@dustymc a couple you can clean up now.

All records in https://arctos.database.museum/search.cfm?guid_prefix=APSU%3AHerp&customoidoper=LIST&oidnum=A-

That have identifiers of type institutional catalog number starting with A- should be issued by https://arctos.database.museum/agent/21334600 and can be changed to type identifier

All records in https://arctos.database.museum/search.cfm?guid_prefix=APSU%3AHerp&customoidoper=LIST&oidnum=R-

That have identifiers of type institutional catalog number starting with R- should be issued by https://arctos.database.museum/agent/21352850 and can be changed to type identifier

dustymc commented 5 months ago

That have identifiers of type institutional catalog number starting with A- should be issued by https://arctos.database.museum/agent/21334600 and can be changed to type identifier

temp_APSUHerpA.csv.zip

UPDATE 5713

dustymc commented 5 months ago

That have identifiers of type institutional catalog number starting with R- should be issued by https://arctos.database.museum/agent/21352850 and can be changed to type identifier

temp_APSUHerpR.csv.zip

UPDATE 2363

Jegelewicz commented 5 months ago

@dustymc that is them but missing the issued by for both.

dustymc commented 5 months ago

missing the issued by

I updated as you asked?!?

Jegelewicz commented 5 months ago

I updated as you asked?!?

I just meant that issued by wasn't in the file you attached.

Jegelewicz commented 5 months ago

@campmlc I have placed a bunch of no-data identifiers in the identifier unbulkloader. If you could review and let me know if we can just remove them, that would be super helpful! The are under my username and have the following statuses

DGR Bird DGR Ento DGR Mamm MSB Fish MSB Herp MSB Mamm

I think these all got created using a formula in someone's bulkload file. They are either just a bunch of dashes or a letter and some dashes. They don't offer any information and I think it would be good to remove them.

campmlc commented 5 months ago

The following marked to autoload in the identifier unbulkloader to remove the legacy data migration institutional catalog number of "M" or "H" etc. DGR Bird DGR Ento DGR Mamm MSB Fish MSB Herp MSB Mamm

Jegelewicz commented 5 months ago

@campmlc I do not have access to MSB:Herp. Can you load this to https://arctos.database.museum/loaders/bulkUnLoadIdentifier.cfm?

cf_temp_unload_identifiers_download(1).csv

dustymc commented 5 months ago

https://docs.google.com/spreadsheets/d/1rNlvoWnDrWeVrxD2nWsMAaFO8x3fmBDQ445sWc8DcGQ/edit#gid=1128266536 is fresh.

Can we PLEASE immediately proceed as outlined in the initial comment, before this taints the new Arctos record GUID type?

@mkoo help?

campmlc commented 5 months ago

I would like at least 24 hours to look over this more, and I'm sure other collections are in the same boat. There are complications here that have not been addressed.

Jegelewicz commented 5 months ago

@dustymc all of these ALMNH:Inv can be changed to identifier

identifierDownload(24).csv

dustymc commented 5 months ago

all of these ALMNH:Inv can be changed to identifier

done

Jegelewicz commented 5 months ago

Actually, I have a lot that can be changed. I just didn't ask because I had assumed they would be changed as part of the switch to Arctos record GUID and I'd rather not spend too much time on it. The sheet won't ever properly open because I guess it is too big? Anyway, here are some that you can change immediately for the entire collection:

NMMNH:Paleo ALMNH:EH CRCM:Bird CRCM:Mamm UTEP:Herp APSU:Herp UNM:MET ASUMZ:Bivalve

Maybe with those gone the list will be easier to work in.

dustymc commented 5 months ago

assumed they would be changed as part of the switch to Arctos record GUID

So did I but here we are....

rather not spend too much time on it.

Agree. I can snipe at things if someone wants to deviate from what I laid out up yonder, if not then let's just get this gone before someone else finds themselves all tangled up in the arbitrary again, and anything that needs adjusted can be dealt with at any time.

Anyway, whole collections aren't bad (I hope), I'll run those and repull.

Jegelewicz commented 5 months ago

whole collections aren't bad (I hope)

These are all things I either did or had someone do. I can be blamed if anyone yells.

dustymc commented 5 months ago

https://github.com/ArctosDB/arctos/issues/7836#issuecomment-2154912123 done, https://docs.google.com/spreadsheets/d/1rNlvoWnDrWeVrxD2nWsMAaFO8x3fmBDQ445sWc8DcGQ/edit#gid=1467013531 updated

campmlc commented 5 months ago

Thanks guys, I'll try to look over MSB ones today. @AdrienneRaniszewski @jldunnum @jtgiermakowski @msbparasites

Jegelewicz commented 5 months ago

@dustymc can you update the Google sheet - I'll see if there are any others I can handle.

dustymc commented 5 months ago

update the Google sheet

I did

campmlc commented 5 months ago

There are still over 96000. Would it be possible to get an explanation as to why this identifier is a problem? These may or may not be in whatever correct format, but like collector numbers they are linked to a formal institutional catalog that in most cases is written down in a ledger somewhere, and is the basis of publication citation. It still feels totally wrong to eliminate this identifier type. @DerekSikes

jldunnum commented 5 months ago

Traveling to ASM meeting so can’t assess what is critical here.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Mariel Campbell @.> Sent: Friday, June 7, 2024 9:20:19 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] Feature Request - remove identifier type institutional catalog number (Issue #7836)

[EXTERNAL]

There are still over 96000. Would it be possible to get an explanation as to why this identifier is a problem? These may or may not be in whatever correct format, but like collector numbers they are linked to a formal institutional catalog that in most cases is written down in a ledger somewhere, and is the basis of publication citation. It still feels totally wrong to eliminate this identifier type. @DerekSikeshttps://github.com/DerekSikes

— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/7836#issuecomment-2155055744, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA3FGM2OHCVK5XKMQF3ZGHFTHAVCNFSM6AAAAABIY2FGFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGA2TKNZUGQ. You are receiving this because you were mentioned.Message ID: @.***>

mkoo commented 5 months ago

Please see my email @campmlc

Jegelewicz commented 5 months ago

@dustymc these can be changed to identifier

ALMNH"Geo ALMNH:Inv ALMNH:Paleo ASUMZ:Mamm UTEP:ES

@mkoo it would pare things down if the MVZ collections could be converted as well.

DLM EDIT

done

Jegelewicz commented 5 months ago

@dustymc

There are a bunch of these that I think should be type Arctos record GUID based upon the issued by.

guid_prefix triplet guid other_id_type display_value id_references assigned_agent assigned_date issued_by_agent remarks coll_obj_other_id_num_id
CHAS:Bird CHAS:Bird:7968 https://arctos.database.museum/guid/CHAS:Bird:7968 institutional catalog number 30303 self Jessica Weller 2024-01-12 Denver Museum of Nature & Science   17754573
mkoo commented 5 months ago

@dustymc these can be changed to identifier

ALMNH"Geo ALMNH:Inv ALMNH:Paleo ASUMZ:Mamm UTEP:ES

@mkoo it would pare things down if the MVZ collections could be converted as well.

Sorry-- I thought you knew to do this for MVZ (I'm pretty sure I gave the verbal ok yesterday wtih Carol)- Thanks!

DLM EDIT

MVZ done (and some/many/most of them were clearly people getting lost after the last cleanup).

Jegelewicz commented 5 months ago

@dustymc all of the CRCM collections can be changed to identifier, but if they have no issued by, it should be https://arctos.database.museum/agent/21346711

DLM edit

done

Jegelewicz commented 5 months ago

@lin-fred the NMMNH:Mamm institutional catalog numbers all appear to be duplicates of the actual catalog number - can they just be removed?

If they are needed, can we change to identifier with issued by https://arctos.database.museum/agent/1014941

lin-fred commented 5 months ago

I don't think I am the one using those numbers?

See https://github.com/ArctosDB/arctos/issues/6881#event-10825361484

Jegelewicz commented 5 months ago

@lin-fred from https://arctos.database.museum/guid/NMMNH:Mamm:4504

image

These are in your records and are redundant?

dustymc commented 5 months ago

type Arctos record GUID based upon the issued by.

I'm probably lost, not sure https://arctos.database.museum/guid/CHAS:Bird:7968 (Passerina) has anything to do with https://arctos.database.museum/guid/DMNS:Bird:30303??

Jegelewicz commented 5 months ago

OK fair - I didn't look at the DMNS stuff, so these are who knows.....

lin-fred commented 5 months ago

@lin-fred from https://arctos.database.museum/guid/NMMNH:Mamm:4504

image

These are in your records and are redundant?

Oh that's weird, I definitely did not add those. I've just looked through our bulkupload of mammals excel sheet and those are not in there. Did they get added because of the reciprocal relationship?

I don't need them there, as you said, its duplication of info.

Jegelewicz commented 5 months ago

@campmlc @jldunnum

from https://arctos.database.museum/guid/MSB:Mamm:287252

I understand that you want to keep these, so I suggest that they be changed to type= identifier and issued by = https://arctos.database.museum/agent/1014941

campmlc commented 5 months ago

These were added before the NMMNH migration to Arctos, using Patty Gegick's Excel file for the catalog number notation, when the records were first cataloged at MSB . They are equivalent to the Arctos record guids linking to NMMNHS now. I'll let @jldunnum respond to how he wants them dealt with.

Jegelewicz commented 5 months ago

@dustymc in these records - https://arctos.database.museum/search.cfm?guid_prefix=UWBM%3AMamm&customoidoper=LIST&oidtype=institutional%20catalog%20number&oidnum=WDFW%25

the institutional catalog numbers that start with WDFW can be changed to identifier and issued by should be added as https://arctos.database.museum/agent/21350366

DLM EDIT

done

Jegelewicz commented 5 months ago

@dustymc in these records - https://arctos.database.museum/search.cfm?guid_prefix=UWBM%3AMamm&customoidoper=LIST&oidtype=institutional%20catalog%20number&oidnum=MMP%25

the institutional catalog numbers that start with MMP can be changed to identifier and issued by should be added as https://arctos.database.museum/agent/1014687

Jegelewicz commented 5 months ago

@campmlc I am unloading these. They are all MSB M - and like the other batches, it looks like they were added as part of some concatenation for which they had nothing to add to the concatenated value.

identifierDownload(25).csv

Jegelewicz commented 5 months ago

@dustymc all UWBM:Mamm institutional identifiers can be changed to identifier.

Jegelewicz commented 5 months ago

I need fresh data - I keep looking at things I've already cleaned up.

campmlc commented 5 months ago

Question as to whether any of this cleanup will affect data in https://arctos.database.museum/info/unreciprocated_relationships.cfm ?

dustymc commented 5 months ago

fresh data

https://docs.google.com/spreadsheets/d/1rNlvoWnDrWeVrxD2nWsMAaFO8x3fmBDQ445sWc8DcGQ/edit#gid=912219362

campmlc commented 5 months ago

All MSB:Para records related to institutional catalog numbers issued by the MSB:Host collection agent https://arctos.database.museum/agent/21334672 should all be changed to Arctos record guid relationships.

Jegelewicz commented 5 months ago

UTEP:Herb UTEP:Inv

can be changed to identifier

Jegelewicz commented 5 months ago

All MSB:Para records related to institutional catalog numbers issued by the MSB:Host collection agent https://arctos.database.museum/agent/21334672 should all be changed to Arctos record guid relationships.

@campmlc I think those are all done.

How about all the MSB:Para that have a relationship to an MSB:Mamm identifier? Many of these are odd duplicates. If there is already a link using the same triplet in the GUID, can we get rid of the triplet identifier?

image

Jegelewicz commented 5 months ago

@campmlc and how about these duplications?

https://arctos.database.museum/guid/MSB:Mamm:140646

includes

image

Is that necessary? If so, can we instead say this is an identifier issued by https://arctos.database.museum/agent/21274282

Jegelewicz commented 5 months ago

@campmlc see also SSUC M - in https://arctos.database.museum/loaders/bulkUnLoadIdentifier.cfm