ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Other Locality IDs #2210

Closed Nicole-Ridgwell-NMMNHS closed 5 years ago

Nicole-Ridgwell-NMMNHS commented 5 years ago

Proposed new value I would like to see a way to include additional locality identifiers.

Definition Name, ID, Or Number other than Locality Nickname that has been used to identify this locality.

Context There are some existing fields where this type of data has been put:

Verbatim locality is a collecting event field. Identifiers that are static through time and refer to the place and not the event should be associated with locality.

Other IDs apply to specimens, not localities. I realize that there is a Locality ID under Other ID, but why, other than legacy data structure issues, would you put locality level data in a specimen field?

Example NMMNH L-6260 Our primary key locality number that I would put under Locality Nickname.

Other IDs for this same locality

UALP 75106 The primary key that the University of Arizona Laboratory of Paleontology used for this locality before we acquired the collection.

LLJ 13-75 The field number given to the site. Arguably this belongs in collection event, however for the last 30+ years our curators and collectors have used these to refer to the locality itself, not a single collecting event or set of collecting events.

Taylor's Mound The name that people call the site.

All of these IDs are things that I need to see when I pull up the locality record. I shouldn't have to dive into the collecting events or specimens to find them.

dustymc commented 5 years ago

It would be very useful to know how you've defined the localities, and what data are associated with specimens from them. Could you build a named locality and attach a specimen or two? An example might be useful.

Would "Taylor's Mound" not be specific locality?

Nicole-Ridgwell-NMMNHS commented 5 years ago

Taylor's Mound is a made-up name (probably because the locality is a mound that Taylor sat at for half a day excavating fossils). It is used to refer to the locality because it is easier to remember and associate with memories than a number. Some of our localities will have both a name and a specific locality (ex. Larry's Last Gasp, De-Na-Zin Wash). The name isn't something you can look up on a map, so I wouldn't think it should go in specific locality.

Nicole-Ridgwell-NMMNHS commented 5 years ago

Here is how I would enter the locality if I did build it:

Higher Geography Continent: North America Country: USA State: New Mexico County: San Juan Map Name (not sure that either of these would go here because they are not in the 1:250,000 series): East Fork Kutz Canyon 7.5' 1985 NM (USGS 1:24,000) Navajo Reservoir 1991 (USGS 1:100,000)

Locality Locality Nickname: NMMNH L-6260 [Other locality names UALP 75106 LLJ 13-75 Taylor's Mound] Specific Locality: No specific locality recorded (but many of our localities do have this) Original Units: TRS and UTM Datum: NAD 27 Reference Source: Locality file Chronostratigraphy Eon/Eonothem: Phanerozoic Erathem/Era: Cenozoic System/Period: Paleogene Series/Epoch: Paleocene Stage/Age: Danian Lithostratigraphy Formation: Nacimiento Biostratigraphy (attribute will need to be added) North American Land Mammal Age: Torrejonian

This locality would have at least 7 collecting events spanning the years 1975 through 2006. Here is the first collecting event for this locality: Event Nickname: (blank) Verbatim Locality: (blank, not recorded in our locality database) Verbatim Date: 1/1/1975 Began Date: 1975 End Date: 1975 Verbatim Coordinates: (blank, not recorded in our locality database) Remarks: Mammal jaws and teeth

This locality has 124 specimens. Here is one of the first specimen events and speicmens: Event Determiner: L. L. Jacobs Event Date: 1975 Event Type: collection Collecting Method: unknown Collecting Source: wild caught (or in situ, etc. if this gets changed)

Specimen Catalog Number: 65448 Cataloged Item Type: Specimen GUID Prefix: NMMNH:Paleo Collection: Vertebrate Fossils Collection Code: EH Description: New Mexico Museum of Natural History and Science Vertebrate Paleontology Collection Institution Acronym: NMMNH P- Taxonomy: Mammalia Parts: left distal humerus Preservation: Fossil Disposition: In Collection Lot Count: 1

dustymc commented 5 years ago

Thanks!

The name isn't something you can look up on a map, so I wouldn't think it should go in specific locality.

Agreed - I'm definitely coming around to the idea that specific locality should be something a natural language parser might understand.

Quick aside:

Map Name

That's one of the places where I don't think our "user friendly translations" of table names to UI is in fact friendly. That's geog_auth_rec.quad and it should be limited to "official" USGS 1:250000 map names. It was introduced as a county-alternative for Alaska, but I think it's had some other use.

I think my real question boils down to this: once you establish the locality consisting of

and name it, then that name cannot be re-used. A new specimen from not-Nacimiento formation (or ANY other variation in the 3d shape defined by everything in and above locality) cannot share the locality name. Does that line up with your data?

("NMMNH L-6260a" or any other unique string of course remains available for the float-or-whatever.)

All that aside, I'm starting to think this is somewhat analogous to geog search term. There's one "official name"


UAM@ARCTOS> select higher_geog from geog_auth_rec where geog_auth_rec_id=10000319;

HIGHER_GEOG
------------------------------------------------------------------------------------------------------------------------
Asia, Iran

and any number of "alternate names"

UAM@ARCTOS> select SEARCH_TERM from geog_search_term where geog_auth_rec_id=10000319;

SEARCH_TERM
------------------------------------------------------------------------------------------------------------------------
Islamic Republic of Iran
Jomhuri-ye Eslāmi-ye Irān
Persia
جمهوری اسلامی ایران

which are useful for searching, mapping verbatim data to Arctos terminology, etc.

http://arctos.database.museum/geography.cfm?country=Iran

If that seems reasonable we'd need to discuss things like whether the extra terms would prevent a merger (first impression: no), what we'd do with them if there is a merger (merge them too?), etc. - and of course none of that is relevant to named localities, which will never merge.

Until that's resolved and implemented, I'd recommend putting the additional names in locality remarks (or somewhere predictable) in some format that will be easy to extract. JSON is my preference, but it's not the only possibility. Something like

[{"locality_name":"UALP 75106"},{"locality_name":"LLJ 13-75"},{"locality_name":"Taylor's Mound"}]

would be trivial to find and extract, for example.

Nicole-Ridgwell-NMMNHS commented 5 years ago

then that name cannot be re-used

Yes, that is fine. We will probably have a few instances needing a workaround, but doing those will probably actually help us clean up that data.

All that aside, I'm starting to think this is somewhat analogous to geog search term. There's one "official name" and any number of "alternate names"

That seems like a good model that would work well for this type of data. If it is not resolved by the time we start uploading our locality data, I'd be fine temporarily using the JSON format in the remarks field.

If that seems reasonable we'd need to discuss things like whether the extra terms would prevent a merger (first impression: no), what we'd do with them if there is a merger (merge them too?), etc.

I agree - I see no reason they should prevent a merger, and it seems like the alternate names would then have to merge too . . .

Maps may be another issue, but I'll see if there is another place I can put them.

dustymc commented 5 years ago

I think all I need is a name to go Next Task.

create table locality_identifiers (
    locality_id FKEY-->locality
    locality_identifier varchar60 not null
);

??

Nicole-Ridgwell-NMMNHS commented 5 years ago

locality_id_alt?

dustymc commented 5 years ago

Presumably that's "alternate" or "alternative" - if so, I'm not sure we need to be that specific. Eg

Those data won't fit in locality.locality_name (and so I've stuffed them into otherID "Locality ID"), but they're not really alternate either - the ID is the primary identifier in the context of cards-or-whatever, it's just not capable of uniquely identifying a 3D+geology shape (locality.locality_name's job).

The functional distinction - http://arctos..../locality_name/locality+1 can serve as a guid while http://arctos..../this_new_thing/locality+1 may reference a million locality.locality_ids (and the stuff attached to them) - so at some level the name doesn't matter, but I'm not anxious for another label-debate either.

I have no real objections to locality_id_alt, but I think it may find a way to be confusing.

FYI in the current back-end I have up to 30 characters available for table and column names - "alternate_locality_identifier" is possible/fits.


UAM@ARCTOS> select length('alternate_locality_identifier') from dual;

LENGTH('ALTERNATE_LOCALITY_IDENTIFIER')
---------------------------------------
                     29
Jegelewicz commented 5 years ago

If we do this, do we still need Locality_ID in other IDs? Shouldn't all of those be moved to the new thing we are creating?

If so, could we stick with Locality_ID?

If not, then I think ALTERNATE_LOCALITY_ID would be OK.

dustymc commented 5 years ago

do we still need Locality_ID in other IDs

No, but maybe.... I'll have to look at the existing data and talk to those collections - I think we can just move everything, but there's some chance there's some weirdness in there somewhere. (And there may be concerns around external apps.)

And just for the sake of clarity, these (whatever we call them) will be in a 1:zero-or-more relationships with localities, and there will be no unique indexes. This will be possible:

Nicole-Ridgwell-NMMNHS commented 5 years ago

they're not really alternate either

How about OTHER_LOCALITY_ID or SECONDARY_LOCALITY_ID, something that indicates they're not necessarily interchangeable? We could also use name instead of id - OTHER_LOCALITY_NAME or SECONDARY_LOCALITY_NAME.

Jegelewicz commented 5 years ago

I was going to suggest ALTERNATE_LOCALITY_NAME but I feel like we haven't cleared up the "NAME" vs "NICKNAME" thing? #2105

dustymc commented 5 years ago

OTHER_LOCALITY_ID

I don't completely hate it, but I'm not sure it's 100% accurate either - as above it may hold the "primary" identifier. ("Normal" OtherIDs are always "other" to catalog number.)

SECONDARY_LOCALITY_ID

I like this one less - they may be "primary."

ALTERNATE_LOCALITY_NAME

As above, this is not necessarily an alternate to anything - it may be THE name, just one that doesn't do what locality.locality_name does.

2105

I think @Nicole-Ridgwell-NMMNHS has a fairly compelling case for "name," and that's certainly more reflective of the model and the intent behind the model.

FWIW I still like "locality identifiers" - there are no implications other than some sort of identifiers hooked to localities.

Nicole-Ridgwell-NMMNHS commented 5 years ago

"locality identifiers"

That is fine with me, especially if we can eventually move over the other IDs Locality IDs.

Nicole-Ridgwell-NMMNHS commented 5 years ago

Where are we on this? Can we add locality_identifier as a locality field? Documentation would say something like:

"locality identifier other than locality nickname [or name, depending on the outcome of #2105] that is used by a collection or publication to reference this locality."

dustymc commented 5 years ago

I do not think a field is appropriate; this should be implemented as a data object. (Unless there's always exactly zero or one of these, which doesn't seem to be the case.)

I see two obvious possibilities:

  1. like geography search terms

locality_id FKEY-->locality id VARCHAR

  1. more like otherIDs

locality_id FKEY-->locality id_type -->new code table id VARCHAR

The first is obviously simpler, supports things like "Some Assigner LocalityId Number 12," but isn't terribly searchable.

The second is more complicated (there's a new code table to discuss), supports "Some Assigner LocalityId"=="Number 12," and is therefore more searchable (at least by type).

Jegelewicz commented 5 years ago

I think we need the second because there could be two locality IDs for the same locality. I'm on board with the idea and essentially moving "locality_ID" out of the other ID table, but I guess we need a plan for transitioning data already in there to this new table? I know a lot of them are in the ALMNH paleo collection, which I can work with @mbprondzinski to structure. Who else would be affected?

dustymc commented 5 years ago

two locality IDs for the same locality

Both of the things I proposed do that, one just strictly types them.

The localityID otherID was built for uam:es. It should be cleaned up, but that doesn't have to happen anytime soon.

mbprondzinski commented 5 years ago

I haven't read this thread in its entirety, but is there a way to create an empty field that can be used by each institution according to their need, such as for Taylor's Mound? Most of our localities are searchable online, but the unsearchable localities, such as Grover's farm or some such other descriptive terminology, could be recorded in that vacant field. Or maybe a drop-down list of those descriptive localities could be created specifically for each institution if needed?

Nicole-Ridgwell-NMMNHS commented 5 years ago

Well, if we go with the second option, our locality IDs could be categorized as:

other institution ID - an official locality ID created by an institution other than the one that originally created this Arctos locality record.

field number - a number assigned to a locality by a collector in the field.

and finally we have what I'd call nickname [which is not how Arctos is currently using nickname], which is some string of words informally used to identify a site (some examples: Taylor's Mound, Trilophosaurus Quarry 2, Steve's Meniscotherium Quarry)

The first two (other institution ID and field number) are already broken out in our database. The nickname is lumped with specific locality info in a field called "locale". So yes, I'd say it would be useful to categorize these IDs.

dustymc commented 5 years ago

Grover's farm

How's that distinct from verbatim locality? If there's some obvious cut it may make sense as an identifier, if not you're just hiding data.

other institution ID

I'm not seeing the value. "Some organization which someone at one time thought might be an institution..." isn't going to lead to more data or make things more discoverable.

other than the one that originally created this Arctos locality record.

Localities are shared; that's not in the model in any form.

a number assigned to a locality by a collector in the field.

If this is something that refers to the specimen from the place, this would just make it impossible to find all similar data. If it refers to the place itself, sure.

informally used to identify a site

I'm still not quite getting how this isn't verbatim locality.

Nicole-Ridgwell-NMMNHS commented 5 years ago

other institution ID

I'm not seeing the value

Institution A visits a site and calls it IA 123. They have a locality record under that number, someone publishes about the site under that number. 20 years later, someone from institution B reads the publication and revisits the site. They call it IB 456 (they can't just keep using IA 123 because the two locality cataloging systems are different) but it is still critical to include the other institution's # in that record to connect with publications and so that someone from Institution B can contact someone from Institution A and ask about site IA 123/IB 456.

other than the one that originally created this Arctos locality record.

Localities are shared; that's not in the model in any form.

Yes, but we and other collections are moving from a locality catalog system that was not shared. There has to be a way to keep track of unique primary identifiers for the same locality from multiple collections. Otherwise you end up losing critical connections to the original data, publications, and other collections that aren't in Arctos.

a number assigned to a locality by a collector in the field.

If this is something that refers to the specimen from the place, this would just make it impossible to find all similar data. If it refers to the place itself, sure.

Yes, in our system it refers to the place itself.

informally used to identify a site

I'm still not quite getting how this isn't verbatim locality.

Perhaps informal was not the correct term. What I'm getting at is, all our localities have numbers. Some of them have also been named and we keep track of those names in our database alongside the number.

Here is a good example. Our institution has been collecting at a site, L-3282, for 20 years and we have hundreds of specimens from the site. L-3282 is the official site number and that is what is referenced in any publications. However, humans like to name things and use names, so for the last 20 years the site has been known as Peterson Quarry. It is referred to as Peterson Quarry in news articles, and also in scientific publications alongside the loc number. Both name and number are used on official correspondence and permitting documents with the federal government.

In other cases maybe the name is only used by the collectors in the field and recorded only in our locality record and field notebooks, but that name is still durable, it doesn't change collecting event to collecting event, and it always refers to that locality.

dustymc commented 5 years ago

contact someone from Institution A

"other institution ID" does that, at best, poorly. "UA" (which will get used if it can be) is maybe a few hundred institutions. "University of Alabama" is (more or less) one. If "UA 12" is all that's needed, I suggest we go with the simpler model. If it's not - if "find localities with a University of Alabama identifier" is a realistic question to ask of the data - we need some more complexity and control.

dustymc commented 5 years ago

other collections that aren't in Arctos.

Is that an argument for also including a base_url, like we've done for the UCMP number in http://arctos.database.museum/guid/UAM:ES:34502?

Nicole-Ridgwell-NMMNHS commented 5 years ago

we need some more complexity and control.

More complexity would be fantastic. I've had to do exactly what you describe (figure out an ambiguous acronym) on multiple occasions because our database does not currently have that complexity.

Is that an argument for also including a base_url

Yes, an option for an external URL would be great.

Jegelewicz commented 5 years ago

We have a temporary solution, documentation request at https://github.com/ArctosDB/documentation-wiki/issues/133

Closing