ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

nature of ID #2170

Closed dustymc closed 4 years ago

dustymc commented 5 years ago

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID is a mess.

Ideally I think we should say something about the evidence used for the ID, but that doesn't seem possible. Minimally we can not say the same thing a bunch of different ways?

NATURE_OF_ID Documentation Hu?
ID of kin An identification based upon the identification of another related individual, often the mother of an embryo. Such a specimen should have at least one individual relationship. spiffy
ID to species group Within a genus, some groups of closely related species are referred to by the species name of one widespread or well known species within the group. Hu? This (and much more) can be done with eg, Sorex {Sorex cinereus complex}, I don't think we need multiple ways of doing that.
curatorial An identification determined by qualified personnel assisting with collection management including collection managers, curators, trained students, staff and others who may not be experts in the group in question but have some knowledge of relevant taxonomy. this looks functionally identical to student
erroneous citation The specimen has been cited in refereed scientific literature by this name but this name is clearly wrong. This situation arises mostly from typographical errors in catalog numbers. spiffy - or not, but it happens
expert The determiner is a person recognized by other experts working with the taxa in question, or the regional biota. this looks functionally identical to student, or at least heavily overused. One agent is an expert on 7112 different taxa! Users have the tools to decide who they consider experts and act on that.
field A determination made without access to specialized equipment or references. "Looks like a moose" may be necessary, but I'm still not sure how it's not yet another version of 'student.'
geographic distribution Specimen is assumed, on the basis of known geographic ranges, to be the species or subspecies expected at the collecting locality. The specimen has not been identified to species or subspecies by comparing it to other subspecies within the genus or species. "It's probably that species because that species lives there and we know what species lives there because we're museums and telling people where stuff lives is what we do....." still looks circular to me
legacy The identification has been transposed from an earlier version of data that did not include identification metadata. In this case the date of the determination is the date that the data were transposed, and the determiner is unknown. I think we're stuck with the concept, but maybe "unknown" is a better label
molecular data An identification made by a laboratory analysis comparing the specimen to related taxa by molecular criteria, generally DNA sequences. yay us!
photograph "Field ID" or perhaps "morphology" is probably always more important than this. "student" version 5
published referral The specimen has been specifically determined to be of a particular taxon in a publication that describes or re-describes that taxon, but the specimen has no type status. Such a specimen record should include a citation, and the determiner(s) of record should be among the authors of the publication. (This means nothing.) I still have no idea what this means
revised taxonomy This designation is appropriate only in the presence of an earlier identification. It implies that the specimen has not been reexamined, and only that a different taxonomic name is being applied. In most cases this results from taxonomic synonymization of names. we're stuck with this
student Specimen has been identified by a person using appropriate references, knowledge, and/or and tools, but not by an expert. This is a broad use of the term student. I think maybe everyone hates the label, but the concept seems accurate for the vast majority of our identifications
type specimen This particular specimen has been described in the literature by this name. The specimen record should contain a citation of the appropriate literature,and the determiner(s) of record should be among the authors of the publication. yay us!

@ccicero

marecaguthrie commented 4 years ago

I second the request for a term that is inclusive of non-biological collections.

On Fri, Oct 4, 2019 at 9:29 AM dustymc notifications@github.com wrote:

  • I think we need a 'general features' term, and it should be easy to adjust the name as long as we don't significantly alter the concept.
  • I think we need a 'fine features' term, and it should be easy to adjust the name as long as we don't significantly alter the concept.
  • Do we also need a "features" term that encompasses both of the above? I think this is the current blocker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AJKSRR5T2DAYPTUGWBVQYT3QM54ONA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMLFTY#issuecomment-538489551, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKSRR6CD7H6SLNPM2C5P5TQM54ONANCNFSM4IEIGLOA .

-- Mareca Guthrie Curator of Fine Arts & Associate Professor of Art University of Alaska Museum of the North 1962 Yukon Drive P.O. Box 756960 Fairbanks, AK 99775-6960 mrguthrie@alaska.edu

University of Alaska Museum of the North: www.uaf.edu/museum UAF Art Department: https://www.uaf.edu/art/ https://www.uaf.edu/art/ Colors of Nature: http://www.colorsofnature.org/

DerekSikes commented 4 years ago

Do we also need a "features" term that encompasses both of the above? I think this is the current blocker.

yes - and for migration it's much safer and more appropriate to use this 'features' without specifying fine or coarse for nature of ID = expert and student, even for 'field' because all such IDs sometimes use coarse and sometimes use fine features. We cannot assume that experts use fine and students use coarse. Experts can look at a moose femur in the woods and know it's moose without a microscope, students can use microscopes etc.

-Derek

On Fri, Oct 4, 2019 at 9:29 AM dustymc notifications@github.com wrote:

  • I think we need a 'general features' term, and it should be easy to adjust the name as long as we don't significantly alter the concept.
  • I think we need a 'fine features' term, and it should be easy to adjust the name as long as we don't significantly alter the concept.
  • Do we also need a "features" term that encompasses both of the above? I think this is the current blocker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM36PO6LY6HTQLSBQJLQM54OLA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMLFTY#issuecomment-538489551, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM5FCQRMUKVGLYSA72TQM54OLANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

Done.

features - Identification based on examination of diagnostic traits that may include qualitiative assessment of morphology, coloration, structure, etc. Examination of features may be direct (specimen) or indirect (e.g., photograph). This value encompasses "fine features" and "coarse features" and should not be used if more specific information is available.

DerekSikes commented 4 years ago

But I can't use 'features' for new IDs - getting that legacy terms error

On Fri, Oct 4, 2019 at 10:14 AM dustymc notifications@github.com wrote:

Done.

features - Identification based on examination of diagnostic traits that may include qualitiative assessment of morphology, coloration, structure, etc. Examination of features may be direct (specimen) or indirect (e.g., photograph). This value encompasses "fine features" and "coarse features" and should not be used if more specific information is available.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM2NOFZZFWHHOM3ZAXDQM6BZTA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMPJKA#issuecomment-538506408, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM3FCY62A5LZ6KHS73DQM6BZTANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

You can now - one of the many reasons I don't want to let this hang out in never-never land for too long....

ccicero commented 4 years ago

So are we down to just one term of "features"?

I thin it will be confusing to have three "features" term - general, fine, no modifier. I am fine with just the one term, to distinguish between that and molecular, audio-visual, relationships, etc

Dusty, can you provide the most current list and migration path?

dustymc commented 4 years ago

No, we are up to three.

At least in theory I think we need more than one, but if there's no compelling immediate need I'm happy enough to drop 'fine' and 'course' and deal with that when someone needs it.

I don't think this changes anything about my migration path; there's still nothing about eg, 'expert' that might lead me to 'features' so I see no alternative but to map it to 'unknown.' I continue hoping that collections WILL have that information (or be willing to make educated guesses at it) and will direct me to better pathways before I have to fall back to shoving most everything into 'unknown' (and remarks). I think ID of kin --> relationship is currently the only unambiguous not-unknown path I have.

DerekSikes commented 4 years ago

no, there are 3 terms:

fine features coarse features features

the last one is only to be used when one can't be sure on which of the first two to apply

On Fri, Oct 4, 2019 at 11:06 AM Carla Cicero notifications@github.com wrote:

So are we down to just one term of "features"?

I thin it will be confusing to have three "features" term - general, fine, no modifier. I am fine with just the one term, to distinguish between that and molecular, audio-visual, relationships, etc

Dusty, can you provide the most current list and migration path?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMZYY7UH4LU44Q6ZU63QM6H43A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMTS6Y#issuecomment-538524027, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUMYWCEBA4CUW5S2EYLTQM6H43ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

In thinking more about this, what are we trying to accomplish by having three terms for essentially the same type of ID, just with more or less detail? I think that's going to cause confusion. That would be like breaking 'molecular' into different types of analysis (genomics, mtDNA sequences, etc.).

Can't we just have one term for 'features' - someone ID'd it based on external or internal features such as size, shape, color, etc. - and then deal with the details if someone wants in other fields, either ID_Remarks or attributes (measurements, color...) plus our new confidence score?

I think we need to step back a bit and ask what our goals are.

dustymc commented 4 years ago

accomplish

In theory, I'm thinking

I'm not sure anyone's ever tried to find specimens in that way (they've certainly failed if they tried) so as long as we're not losing data - and I don't think it's currently there to lose - I am fine with adding that when/if we have the actual need.

https://arctos.database.museum/guid/UAM:Mamm:11507 is an example of a "fine features" specimen, but that's not apparent from the current data, and I don't think we'd lose anything by merging it into 'features.' (And it's not so much "fine features' as 'precise measurements.')

ccicero commented 4 years ago

Right. Also, looking at fine features is different than basing an ID on that. You can base the ID on its features, but then examine fine features and add those as other types of data such as attributes, media, etc. For identification, I think the important thing is to distinguish whether it was based on features, molecular data, audio-visual, distribution, etc.

mkoo commented 4 years ago

Not to throw a new wrench into the works... but may the most important thing we're trying to capture is: FIRST, the confidence or some measure of that (preliminary id vs verified)-- this is really the Nature of Id (quality) SECOND, the method of how that id was made (fine, coarse, or external (trying to get at something less biological than phenotype) vs molecular), etc etc (more quantitative although the method may be based on quality or gestalt or whatever, for ex., if Patton says he "feels" it's a particular subspecies of gopher I'd have more confidence than if anyone else did)

If we think of it that way, then maybe we dont have to parse/ dissect / enumerate terms like "Fine" which just is giving me hives

On Mon, Oct 7, 2019 at 8:38 AM Carla Cicero notifications@github.com wrote:

Right. Also, looking at fine features is different than basing an ID on that. You can base the ID on its features, but then examine fine features and add those as other types of data such as attributes, media, etc. For identification, I think the important thing is to distinguish whether it was based on features, molecular data, audio-visual, distribution, etc.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UOHQJJ2PCN5LXRRHPLQNNJVVA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAQZXDI#issuecomment-539073421, or mute the thread https://github.com/notifications/unsubscribe-auth/AATH7UP7C3XSHDJFRHQHWZ3QNNJVVANCNFSM4IEIGLOA .

dustymc commented 4 years ago

@mkoo I think those things are all closely related, and that's part of what we're doing here.

I think you need all of that to understand an identification. "Patton is an expert (and he may or may not have been working in familiar territory or with identifiable specimens)" doesn't say much.

mkoo commented 4 years ago

yes, what I'm saying is: nature of id= validated or high confidence or verified or whatever term we like method of id= Patton ran allozymes.

user= feels good that it's gopher X

or nature of id= preliminary or low confidence or unvalidated method of id= some student looked at external morphology while wrangling out of its trap (i.e., "field")

user= hmm, (data) buyer beware

On Mon, Oct 7, 2019 at 9:00 AM dustymc notifications@github.com wrote:

@mkoo https://github.com/mkoo I think those things are all closely related, and that's part of what we're doing here.

  • Patton thinks it looks like a gopher based on some technique (nature_of_id) he's familiar with
  • Patton thinks it looks like a gopher based on some technique he's familiar with, but notes that the diagnostic features are mangled
  • Patton thinks it looks like a gopher based on some technique that he's never used before
  • Some student who came to know gophers exists about 5 minutes ago does the same things....

I think you need all of that to understand an identification. "Patton is an expert (and he may or may not have been working in familiar territory or with identifiable specimens)" doesn't say much.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UL6JKMAQK2VEHN2FUTQNNMLJA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAQ4LHI#issuecomment-539084189, or mute the thread https://github.com/notifications/unsubscribe-auth/AATH7UL2GT6KDC42XUAVWS3QNNMLJANCNFSM4IEIGLOA .

dustymc commented 4 years ago

That's what we're doing, but not with that vocabulary.

method ==> nature_of_id validated or high confidence or verified or whatever term we like ==> identification_confidence

mkoo commented 4 years ago

yea I'm saying flip that! we see the nature of id first and it's a quality measurement not the method that's primary. Currently, we are trying to implicitly derive the quality from the method which is not efficient and where we get lost in the fine vs coarse dissections.

On Mon, Oct 7, 2019 at 9:36 AM dustymc notifications@github.com wrote:

That's what we're doing, but not with that vocabulary.

method ==> nature_of_id validated or high confidence or verified or whatever term we like ==> identification_confidence

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UPBZMRAHN42EZWRPPTQNNQPNA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAQ74UA#issuecomment-539098704, or mute the thread https://github.com/notifications/unsubscribe-auth/AATH7UKLZTZ3ATKYFBVIAP3QNNQPNANCNFSM4IEIGLOA .

dustymc commented 4 years ago
  1. I don't think that's a blocker - we can flip labels whenever we want.
  2. Treating an explicitly-subjective NULLable field that many collections have said they won't use as "primary" doesn't make sense to me.
mkoo commented 4 years ago

No, we would alter the crosswalk so it's not nullable... I'm talking about a conceptual "flip" and moving our model of nature of id so that it better captures confidence quality. Because that is implicitly what we are trying to do with things like nature of id="field" vs "expert"

perhaps better to have as a priority conversation topic as I feel that others will have opinions on this who are not part of this thread.

On Mon, Oct 7, 2019 at 9:49 AM dustymc notifications@github.com wrote:

  1. I don't think that's a blocker - we can flip labels whenever we want.
  2. Treating an explicitly-subjective NULLable field that many collections have said they won't use as "primary" doesn't make sense to me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UL2JN3SPRV7SWSWKMTQNNSCRA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARBHLI#issuecomment-539104173, or mute the thread https://github.com/notifications/unsubscribe-auth/AATH7UP2N6DTTGN4YDHLUA3QNNSCRANCNFSM4IEIGLOA .

DerekSikes commented 4 years ago

I feel we're going backwards here.

We added the new field 'ID confidence' which clearly captures what was mixed with 'id method' in 'nature of ID' - maybe we need to change the name of the field from 'nature of id' to 'id method' to make it even more explicit:

ID confidence ID method

-Derek

On Mon, Oct 7, 2019 at 8:56 AM Michelle Koo notifications@github.com wrote:

No, we would alter the crosswalk so it's not nullable... I'm talking about a conceptual "flip" and moving our model of nature of id so that it better captures confidence quality. Because that is implicitly what we are trying to do with things like nature of id="field" vs "expert"

perhaps better to have as a priority conversation topic as I feel that others will have opinions on this who are not part of this thread.

On Mon, Oct 7, 2019 at 9:49 AM dustymc notifications@github.com wrote:

  1. I don't think that's a blocker - we can flip labels whenever we want.
  2. Treating an explicitly-subjective NULLable field that many collections have said they won't use as "primary" doesn't make sense to me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UL2JN3SPRV7SWSWKMTQNNSCRA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARBHLI#issuecomment-539104173 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AATH7UP2N6DTTGN4YDHLUA3QNNSCRANCNFSM4IEIGLOA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM3CIP6Z6MMXKNONDRTQNNS35A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARB4LA#issuecomment-539106860, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUMZTNRNF3EEYNUCX3XTQNNS35ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

backwards

Certainly somewhere we've been before.

method

That's just labels - we can go there without getting stuck here.

mkoo commented 4 years ago

so maybe ID confidence is the new nature of id-- put that at top ID method is a new field with the nature of id values with all the variation that you need.

this would map to DwC identificationVerificationStatus and identificationRemarks respectively, right?

On Mon, Oct 7, 2019 at 10:49 AM dustymc notifications@github.com wrote:

backwards

Certainly somewhere we've been before.

method

That's just labels - we can go there without getting stuck here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AATH7UM4OG2372H4MIMQH73QNNZBDA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEARHAAI#issuecomment-539127809, or mute the thread https://github.com/notifications/unsubscribe-auth/AATH7UNTWTZSABSPFVQUYVTQNNZBDANCNFSM4IEIGLOA .

dustymc commented 4 years ago

ID confidence is the new nature of id

But it's not! One can be confident (or not) of IDs of any nature.

identificationVerificationStatus

That's basically a term for a simpler model. We "verify" by adding identifications. I suppose I could somehow stuff something that means "28 people agree on this" (http://arctos.database.museum/guid/MSB:Mamm:55245) in there, but that's not very meaningful without https://github.com/ArctosDB/arctos/issues/2267 - spelling something the same way doesn't necessarily mean you think it's the same thing, and vice-versa.

ccicero commented 4 years ago

I like Derek's suggestion of ID confidence + ID method. For method, we want to know if you looked at features first, then based subspecies on geographic distribution, then sequenced and found it's really some other taxon, etc. For observations, maybe it's based on features (saw a moose) or audio-visual.

Add confidence on top of that.

What about adding a third field for ID determination: field museum lab

dustymc commented 4 years ago

method

I think that's usefully in id_sensu (which might need a new name? We should get taxon concepts settled in first) - I think every method worth recording is going to require a publication to describe.

I'm not seeing what value field museum lab would add??

you looked at features first, then based subspecies on geographic distribution, then sequenced

Those should be three IDs.

Jegelewicz commented 4 years ago

When will the "legacy" options stop showing up in the data entry screen pick list?

dustymc commented 4 years ago

When I get a migration path and they're no longer used.

ccicero commented 4 years ago

Yes, those would be three IDs. But I think it's useful to know whether the ID was made in the field (first attempt, without comparative material or more fine-scaled methods) or not. Maybe just a flag "Is this a field ID" ?

I feel like we need to 'regroup.' From what I can tell, here is the latest version (see google doc):

-- fine features + coarse features ---> I suggest combining these two into a single term 'features', but this is still being debated -- function -- molecular -- distribution (do we need 'geographic'?) -- karyotype -- relationship -- audio-visual -- taxonomic revision -- unknown

Are we agreed on all of these except for whether to break up 'features' ?

Confidence level added to this: -- unknown (= default?) -- low -- medium -- high

Is this where we stand now?

dustymc commented 4 years ago

"Is this a field ID"

I think that takes up back to where we are - random data that doesn't really do anything - but with a lot more work.

without comparative material fine-scaled methods

I think the idea that those don't exist in the field is a bit discipline-specific. In any case they may have a place here, but they should be quantified/categorized as what they are, not lumped into whatever "field" might mean.

I don't think we have to agree on 'features' except we should get rid of anything that won't be used. (And I believe that ship sailed a few days ago.)

The default for confidence will be NULL. I'm still pretty baffled as to what purpose "unknown" serves or why it is necessary.

Here's current data and proposed mappings.

UAM@ARCTOS> select nature_of_id,count(*) from identification group by nature_of_id order by nature_of_id;

NATURE_OF_ID COUNT(*)


ID of kin 29755 --------------------------> relationship ID to species group 9591------------------> unknown coarse features 55------------------------> none/happy curatorial 14226----------------------> unknown (or features?) erroneous citation 156------------------------> unknown expert 725767----------------------> unknown (or features?) features 269----------------------------->none/happy field 352627-----------------------> unknown (or features?) fine features 75-------------------------->none/happy geographic distribution 25559------------------->none/happy molecular 13888------------------------->none/happy photograph 495-----------------------> unknown (or features?) published referral 3953-----------------------> unknown revised taxonomy 258734----------------------->none/happy student 354338-----------------------> unknown (or features?) type specimen 23663-----------------------> unknown unknown 2555259-----------------------> >none/happy

Again, PLEASE give me better mapping, globally or for your collection/parts of your collection.

ccicero commented 4 years ago

My suggestions in bold (at least for MVZ Birds, and probably for the other MVZ collections although other staff curators should weigh in @atrox10 @mkoo). I still don't like having three terms for 'features' but I guess that's a curatorial call on how we use it.

ID of kin 29755 --------------------------> relationship ID to species group 9591------------------> unknown CC: features coarse features 55------------------------> none/happy curatorial 14226----------------------> unknown (or features?) CC: features erroneous citation 156------------------------> unknown expert 725767----------------------> unknown (or features?) CC: features features 269----------------------------->none/happy field 352627-----------------------> unknown (or features?) CC: features fine features 75-------------------------->none/happy geographic distribution 25559------------------->none/happy molecular 13888------------------------->none/happy photograph 495-----------------------> unknown (or features?) CC: features published referral 3953-----------------------> unknown revised taxonomy 258734----------------------->none/happy student 354338-----------------------> unknown (or features?) CC: features type specimen 23663-----------------------> unknown CC: features unknown 2555259-----------------------> >none/happy

campmlc commented 4 years ago

I agree with Carla's mapping. I also support just using "features" as a single term. Do we have a map already from these terms to confidence level?

On Wed, Oct 9, 2019 at 8:59 AM Carla Cicero notifications@github.com wrote:

My suggestions in bold (at least for MVZ Birds, and probably for the other MVZ collections although other staff curators should weigh in @atrox10 https://github.com/atrox10 @mkoo https://github.com/mkoo). I still don't like having three terms for 'features' but I guess that's a curatorial call on how we use it.

ID of kin 29755 --------------------------> relationship ID to species group 9591------------------> unknown CC: features coarse features 55------------------------> none/happy curatorial 14226----------------------> unknown (or features?) CC: features erroneous citation 156------------------------> unknown expert 725767----------------------> unknown (or features?) CC: features features 269----------------------------->none/happy field 352627-----------------------> unknown (or features?) CC: features fine features 75-------------------------->none/happy geographic distribution 25559------------------->none/happy molecular 13888------------------------->none/happy photograph 495-----------------------> unknown (or features?) CC: features published referral 3953-----------------------> unknown revised taxonomy 258734----------------------->none/happy student 354338-----------------------> unknown (or features?) CC: features type specimen 23663-----------------------> unknown CC: features unknown 2555259-----------------------> >none/happy

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBDKNNAILYU5MZSI2OTQNXWTLA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYF3ZI#issuecomment-540040677, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBHZHQLWJWWM7BCL5J3QNXWTLANCNFSM4IEIGLOA .

dustymc commented 4 years ago

map already from these terms to confidence level

No - I can handle that if you want, but without feedback from you I think I can only map to NULL.

@DerekSikes has that mapped for his collections in https://github.com/ArctosDB/arctos/issues/2170#issuecomment-537634372 - I can extend that to the default if you'd like.

Do we also need a "taxon description" or something confidence value for the types?

The data suggest that value is not being used correctly which makes me a bit hesitant to double down on it, but it's just a vocabulary term at this point.


select
  type_status,
  count(*)
from
  identification
  left outer join citation on identification.collection_object_id=citation.collection_object_id
where
  nature_of_id='type specimen'
group by
  type_status
order by
  type_status
 13  ;

TYPE_STATUS          COUNT(*)
------------------------------ ----------
basis of illustration             138
erroneous citation             75
holotype                 1667
host voucher                  598
isolectotype                8
isoneotype              2
isosyntype              4
isotype                   100
lectotype                  48
neotype                 7
paralectotype                  34
paratopotype                 1887
paratype                 3694
referral                  239
symbiotype                162
syntype                    72
voucher                 21049
                     2744
campmlc commented 4 years ago

Agree with using Derek's as default with the following additions - please review

ID of kin 29755 --------------------------> relationship confidence high ID to species group 9591------------------> unknown CC: features confidence low coarse features 55------------------------> none/happy curatorial 14226----------------------> unknown (or features?) CC: features confidence high erroneous citation 156------------------------> unknown confidence low expert 725767----------------------> unknown (or features?) CC: features confidence high features 269----------------------------->none/happy field 352627-----------------------> unknown (or features?) CC: features confidence low fine features 75-------------------------->none/happy geographic distribution 25559------------------->none/happy confidence medium molecular 13888------------------------->none/happy confidence high photograph 495-----------------------> unknown (or features?) CC: features confidence medium published referral 3953-----------------------> unknown confidence high revised taxonomy 258734----------------------->none/happy confidence high student 354338-----------------------> unknown (or features?) CC: features confidence medium type specimen 23663-----------------------> unknown CC: features confidence high unknown 2555259-----------------------> >none/happy

dustymc commented 4 years ago

I am happy to map whatever you want for your collection, but as global defaults I have some concerns.

ID of kin 29755 --------------------------> relationship confidence high

This seems wrong to me - the confidence comes from the identification of the related specimen.

ID to species group 9591------------------> unknown CC: features confidence low

I've never been sure of the intent here, but I think it would be "we're fairly confident, but to an imprecise taxon." I have no better suggestions....

erroneous citation 156------------------------> unknown confidence low That one's just weird - "we're sure we're wrong"???

field 352627-----------------------> unknown (or features?) CC: features confidence low

This is somewhat taxon-dependent, which I probably can't get at for the purposes of migration. This seems correct for shrews, maybe not so much for bison.

revised taxonomy 258734----------------------->none/happy confidence high

I think this also depends on previous IDs. I see no reason "Clethrionomys, maybe" should become "Myodes, definitely."

campmlc commented 4 years ago

We use ID of kin for Mexican wolves in an endangered species recovery program. Our confidence in the ID is extremely high. I would hope that they aren't accidentally breeding and releasing coyotes . . .

I could go with ID to species group as medium. You know at least genus.

I don't know about erroneous citation - who uses that? It is in the type definition field.

Agree with revised taxonomy depending on previous, but what about legacy? maybe default to medium? @DerekSikes @jldunnum

campmlc commented 4 years ago

One thing on the "ID of kin" values - for a long time, this was (still is?) the default value in data entry rather than leaving the default as NULL.. Can you send a list of ID of kin specimens so we can verify this is really the case? At MSB, my guess is ID of kin should only legitimately be used for Canis lupus baileyi and possibly embryos cataloged separately. @jldunnum

dustymc commented 4 years ago

erroneous citation - who uses that

https://github.com/ArctosDB/arctos/issues/2170#issuecomment-538076627 or...


select 
guid_prefix, count(*) c 
from collection,cataloged_item,identification where 
  4  nature_of_id='erroneous citation' and collection.collection_id=cataloged_item.collection_id and cataloged_item.collection_object_id=identification.collection_object_id group by guid_prefix order by guid_prefix;

GUID_PREFIX                               C
------------------------------------------------------------ ----------
DGR:Bird                                  1
DMNS:Bird                                15
DMNS:Inv                                  2
KNWR:Ento                                 1
KWP:Ento                                  2
MSB:Mamm                                 45
MVZ:Herp                                  5
MVZ:Mamm                                  1
UAM:ES                                    1
UAM:Ento                                 45
UAM:Herb                                 10
UAM:Mamm                                  1
UAMObs:Ento                              22
UCM:Bird                                  1
UTEP:ES                                   3
UTEP:Herb                                 1

16 rows selected.

what about legacy

Sounds like NULL to me; there's no reason to assert anything you can't know.

list

create table temp_nok as select guid, identification.scientific_name,accepted_id_fg from flat,identification where flat.collection_object_id=identification.collection_object_id and identification.nature_of_id='ID of kin';

temp_nok.csv.zip

ewommack commented 4 years ago

Going back to @ ccicero

Yes, those would be three IDs. But I think it's useful to know whether the ID was made in the field (first attempt, without comparative material or more fine-scaled methods) or not. Maybe just a flag "Is this a field ID" ?

What about all of the specimens that are IDed in the lab first rather then the field? I make a lot of quick ID calls when specimens are donated as salvage to the museum, and then stuff them in a freezer for future prepping. I'd count those as Field IDs (just not made in the field), but would this be confusing to someone trying to follow?

ccicero commented 4 years ago

Good point Beth. We also use(d) 'field' ID for salvaged specimens prepped in the lab because that was the most fitting option, but it was always strange to do so. I think 'features' takes care of both field and lab preps.

Re: confidence levels mappings:

ID of kin 29755 --------------------------> relationship confidence high ---> I think high is appropriate here, contrary to Dusty's comment. If you know that the parent or sibling is species X, then you should be confident that what you are ID'ing is also species X.

ID to species group 9591------------------> unknown CC: features confidence low

---> I agree with Dusty's comment here. "Medium" seems ok to me.

coarse features 55------------------------> none/happy curatorial 14226----------------------> unknown (or features?) CC: features confidence high

----> not sure what 'curatorial' means, but I think this would depend on who's doing the ID. If it's a student, then confidence is lower than if ID is by a curator. I'm not sure that we've used this for MVZ collections, can you tell me if we have any with this ID, and the identifier?

erroneous citation 156------------------------> unknown confidence low expert 725767----------------------> unknown (or features?) CC: features confidence high features 269----------------------------->none/happy field 352627-----------------------> unknown (or features?) CC: features confidence low

---> Yes, per Dusty's comment that it would be taxon-dependent. But presumably if you only can ID to a genus in the field, then that's the ID that would have been entered. Or if you're not sure of the ID, then you'd add a "?". So can we get at this through taxa_formula? if formula contains 'sp' or 'ssp' or 'string' or '?' of 'cf' or 'aff'-or 'A or B' --> confidence is low if formula contains 'A' or 'A and B' ---> confidence is high if formula contains 'A / B intergrade' or 'A x B' ---> confidence is low?

fine features 75-------------------------->none/happy geographic distribution 25559------------------->none/happy confidence medium

---> This can be low or high depending on the taxon and locality, so I am ok with medium.

molecular 13888------------------------->none/happy confidence high photograph 495-----------------------> unknown (or features?) CC: features confidence medium published referral 3953-----------------------> unknown confidence high revised taxonomy 258734----------------------->none/happy confidence high student 354338-----------------------> unknown (or features?) CC: features confidence medium

---> This can be low or medium depending on the student. We don't use this at MVZ, but I guess medium is ok?

type specimen 23663-----------------------> unknown CC: features confidence high unknown 2555259-----------------------> >none/happy

dustymc commented 4 years ago

@ccicero thanks!

I think there's a bunch of stuff confounded at the intersection of nature and formula. There are ~2500 expert+"A ?" IDs, for example, I'm not sure if that's what you mapped or not. Data attached. I very tentatively suggest we don't pursue that as globals - it's fine for MVZ if that's how it's been used there.

create table temp_lcidf as 
  select 
  guid_prefix, 
  taxa_formula,
  nature_of_id,
  count(*) c 
from 
  collection,
  cataloged_item,
  identification 
where 
 collection.collection_id=cataloged_item.collection_id and 
 cataloged_item.collection_object_id=identification.collection_object_id and
 taxa_formula in ('A ?','A sp.','A ssp.','A {string}','A aff.','A cf.','A or B')
group by 
  guid_prefix, 
  taxa_formula,
  nature_of_id
order by 
  guid_prefix, 
  taxa_formula,
  nature_of_id
;

temp_lcidf.csv.zip

There are a few MVZ/student IDs.

MVZ:Bird 2 MVZ:Herp 18 MVZ:Mamm 858

add a "?"

Good point- can we (long after the dust has settled here!) get rid of that formula, or is it somehow different than confidence?

'A or B'

At least some of those are low-precision but not necessarily low-confidence. "It's one of these two almost-identical shrews; I can't tell them apart." I don't think there's any confidence embedded in that. @amgunderson @KyndallH

DerekSikes commented 4 years ago
add a "?"

Good point- can we (long after the dust has settled here!) get rid of that formula, or is it somehow different than confidence?

I've long thought that there's serious redundancy between an ID with nature of ID = student and formula = A ?

One pro of keeping A ? is that string gets sent to GBIF etc and is more visible than the nature of ID information (I'm not even sure that gets shared - does it?)

But with ID confidence as a field, it seems silly to retain formula A ? since this would be the same as confidence = low (or possibly medium).

-Derek

On Thu, Oct 10, 2019 at 8:34 AM dustymc notifications@github.com wrote:

@ccicero https://github.com/ccicero thanks!

I think there's a bunch of stuff confounded at the intersection of nature and formula. There are ~2500 expert+"A ?" IDs, for example, I'm not sure if that's what you mapped or not. Data attached. I very tentatively suggest we don't pursue that as globals - it's fine for MVZ if that's how it's been used there.

create table temp_lcidf as select guid_prefix, taxa_formula, nature_of_id, count(*) c from collection, cataloged_item, identification where collection.collection_id=cataloged_item.collection_id and cataloged_item.collection_object_id=identification.collection_object_id and taxa_formula in ('A ?','A sp.','A ssp.','A {string}','A aff.','A cf.','A or B') group by guid_prefix, taxa_formula, nature_of_id order by guid_prefix, taxa_formula, nature_of_id ;

temp_lcidf.csv.zip https://github.com/ArctosDB/arctos/files/3713618/temp_lcidf.csv.zip

There are a few MVZ/student IDs.

MVZ:Bird 2 MVZ:Herp 18 MVZ:Mamm 858

add a "?"

Good point- can we (long after the dust has settled here!) get rid of that formula, or is it somehow different than confidence?

'A or B'

At least some of those are low-precision but not necessarily low-confidence. "It's one of these two almost-identical shrews; I can't tell them apart." I don't think there's any confidence embedded in that. @amgunderson https://github.com/amgunderson @KyndallH https://github.com/KyndallH

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMYZ3FZB4W6JXJLZESTQN5KQLA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA47IPA#issuecomment-540668988, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM5CXAUEZ2Y4WU57P6DQN5KQLANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

Yes, there is some redundancy between ID with "?" and confidence=low, but I favor keeping the "?" because that is passed to data aggregators whereas confidence is/will not. Also, it makes it evident right in the name that the ID is questionable. I'm not sure how confidence will be displayed, but presumably smaller font below the name?

expert + A ? No, I didn't map that specifically, but that implies (to me) that someone who knows something (e..g. a Jim Patton or Peter Pyle, I would call them both experts) still have some questions about ID. So I don't think we can go strictly from nature of ID to the mapping without considering the formula. In fact, I would say that formula might be more informative for mapping than nature, but for some both need to be considered.

I agree re: 'A or B'.

How's this:

if formula contains 'sp' or 'ssp' or 'string' or '?' of 'cf' or 'aff' --> confidence is low if formula contains 'A or B' ---> confidence is medium if formula contains 'A' or 'A and B' ---> use suggested 'nature' mappings if formula contains 'A / B intergrade' or 'A x B' ---> depends on 'nature' - if molecular, probably higher than by features (?)

Can we get this into a google doc somehow (we already have two going, but it might help to start a clean sheet with these different possible combinations to help in mapping)?

dustymc commented 4 years ago

passed to data aggregators

I don't think that's ever a reason to model data - we can map WHATEVER.

not sure how confidence will be displayed

That's UI and "easy" (ish...) to change - you can search now.

Screen Shot 2019-10-10 at 1 11 40 PM Screen Shot 2019-10-10 at 1 12 22 PM

google doc

Good idea - I'm on it.

Jegelewicz commented 4 years ago

This is making my life difficult - I am not able to load anything in the bulkloader because "Legacy nature_of_id terms are disallowed" and I have no idea what to enter in nature of ID anymore....

dustymc commented 4 years ago

bulkloader

Let me know if you want me to update something there.

what to enter

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID

dustymc commented 4 years ago

There's the start of a migration document here:

https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit?usp=sharing

Please start your review of this in the "first_update" tab. I will make two layers of updates from this if necessary:

  1. I can make smaller-scale updates for specific groups of specimens (collection, institution, whatever). For example, I've mapped that I'll first update all nature_of_id="original_data" to nature_of_id="example" with a confidence of "example" for collection "Fake:Example."
  2. Anything not covered in (1) will be updated for all remaining collections. "type specimen" will become "features" with confidence "high" for any collections that are still using "type specimen" (eg, those that haven't requested a custom update) when I get to this step, for example.

Second, I will update again for everything in the "second_update" tab. So all MVZ specimens with a "A sp." formula are currently mapped to confidence=low, even if they were mapped as confidence=high in the "first_update" tab. If there's agreement on confidence mapping "for_collections" can just be changed to 'all' or similar, if not I will run that step only for those collections that request it.

The existing/old values will still be copied to remarks.

ccicero commented 4 years ago

Mapping to aggregators: I agree that we should do what's best for us, but it doesn't hurt to keep the "?" and if we get rid of that and only have confidence, that will likely be buried in a remark somewhere. Given that a lot of requests come through VerNet/GBIF, searches, I think it's worth keeping the "?"

Google doc: I'll look at that as soon as possible, it may be a few days. Thanks.

jldunnum commented 4 years ago

Sorry I haven't been able to make recent AWG meetings and actually contribute constructively. Instead I'm just sending whiny emails. A few things: I know we have lots of IDs which have incorrect nature of ID values generated by our previous DGR collection manager prior to Mariel taking the position. These are primarily the result of inattention to "ID of kin" being set as default or incorrect use of "Type specimen". These remain because we haven't had time to carefully go through them all yet and it is super labor intensive to change them. But I guess this needs to happen quickly now to avoid them being modified and thus harder to track down going forward.

Secondly, I have to say I liked "curatorial" and "field" as they quickly informed that a specimen had either only gone through a provisional first pass in the field or if it had been revisited and examined more closely. I liked being able to see that history and when coupled with ID agent it provided the ID confidence. Now the ID history will appear as "feature" and if looked at again it will have a second "feature", but presumably with a higher confidence value. How will confidence value be displayed? Will it be in the ID history as well?

Overall, I am just very leery about the subjective nature of assigning confidence values (I certainly don't want to assign values to all the legacy IDs which we have no idea on). Even something like "molecular" isn't always going to be high confidence. Just depends on what taxa you included in your analyses or what sequences are available in Genbank when you blasted your sequence.


Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131


From: dustymc notifications@github.com Sent: Thursday, October 10, 2019 3:03 PM To: ArctosDB/arctos arctos@noreply.github.com Cc: Jonathan Dunnum jldunnum@unm.edu; Mention mention@noreply.github.com Subject: Re: [ArctosDB/arctos] nature of ID (#2170)

There's the start of a migration document here:

https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit?usp=sharing

Please start your review of this in the "first_update" tab. I will make two layers of updates from this if necessary:

  1. I can make smaller-scale updates for specific groups of specimens (collection, institution, whatever). For example, I've mapped that I'll first update all nature_of_id="original_data" to nature_of_id="example" with a confidence of "example" for collection "Fake:Example."
  2. Anything not covered in (1) will be updated for all remaining collections. "type specimen" will become "features" with confidence "high" for any collections that are still using "type specimen" (eg, those that haven't requested a custom update) when I get to this step, for example.

Second, I will update again for everything in the "second_update" tab. So all MVZ specimens with a "A sp." formula are currently mapped to confidence=low, even if they were mapped as confidence=high in the "first_update" tab. If there's agreement on confidence mapping "for_collections" can just be changed to 'all' or similar, if not I will run that step only for those collections that request it.

The existing/old values will still be copied to remarks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AED2PAZRWTW3BLDEIRCJMMTQN6KCLA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA557GY#issuecomment-540794779, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA653YYGDOZ3BP46VFLQN6KCLANCNFSM4IEIGLOA.

dustymc commented 4 years ago

labor intensive

As always, I'm happy to try to help that - I just need some way to find existing things and new values. (Sorry I can't help much with the values!)

track down

It's not ideal, but I will retain the old values in remarks, and I can help you find records etc as necessary.

displayed

See https://github.com/ArctosDB/arctos/issues/2170#issuecomment-540767900

history

Yes, it's part of the ID and will be preserved as such.

leery about the subjective nature of assigning confidence values

Yes, I agree, which is why I initially mapped the default to not bring them in at all. It does seem somehow vaguely useful in distinguishing between eg, curatorial and field (which I'd also classify as "vaguely useful" so I don't think much changes there). It also seems very useful to me in noting specific events on specific specimens - "I'm an expert, this is my taxon, I'm not sure about this particular ID because the diagnostic features are mangled" becomes readily searchable rather than something that would have to be extracted from remarks.

In any case I'm happy to do something special for MSB, revise the defaults for everyone, revisit this later from the "old" nature data preserved in remarks, etc.

jldunnum commented 4 years ago

OK, I'll think about how best to search for our problem children and get back with you with potential ideas. Thanks!


Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131


From: dustymc notifications@github.com Sent: Friday, October 11, 2019 11:42 AM To: ArctosDB/arctos arctos@noreply.github.com Cc: Jonathan Dunnum jldunnum@unm.edu; Mention mention@noreply.github.com Subject: Re: [ArctosDB/arctos] nature of ID (#2170)

labor intensive

As always, I'm happy to try to help that - I just need some way to find existing things and new values. (Sorry I can't help much with the values!)

track down

It's not ideal, but I will retain the old values in remarks, and I can help you find records etc as necessary.

displayed

See #2170 (comment)https://github.com/ArctosDB/arctos/issues/2170#issuecomment-540767900

history

Yes, it's part of the ID and will be preserved as such.

leery about the subjective nature of assigning confidence values

Yes, I agree, which is why I initially mapped the default to not bring them in at all. It does seem somehow vaguely useful in distinguishing between eg, curatorial and field (which I'd also classify as "vaguely useful" so I don't think much changes there). It also seems very useful to me in noting specific events on specific specimens - "I'm an expert, this is my taxon, I'm not sure about this particular ID because the diagnostic features are mangled" becomes readily searchable rather than something that would have to be extracted from remarks.

In any case I'm happy to do something special for MSB, revise the defaults for everyone, revisit this later from the "old" nature data preserved in remarks, etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=AED2PA5E3PKD2F6XACPAHHDQOC3HVA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBAWXHQ#issuecomment-541158302, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA5GTEMNXXHKFFQIQBTQOC3HVANCNFSM4IEIGLOA.

ccicero commented 4 years ago

I just thought about another complication with relates to another issue of multiple IDs in data entry.

If you use can only identify a specimen to genus, and use taxon A, then confidence = high Same scenario but you use A sp, then confidence = low

Similarly, we often identify a specimen to species (formula A) in the field or during preparation, but then identify it to subspecies in the museum based on either features or distribution. Sometimes we can't ID to subspecies, so we use A ssp. We could leave it as taxon A, but assigning 'ssp' suggests that we tried to ID to subspecies but couldn't with confidence so we didn't. In this scenario: taxon A = confidence high taxon A ssp. = confidence low

So it seems to me that you can assign different confidence levels depending on the taxon formula you use.

One way to deal with that, which is what the other issue is getting at, is to allow 2 IDs in data entry/bulkloading. Alternatively, we could do the initial ID in data entry/bulkloading and then go add the second ID/formula once the data are loaded, but that is an extra step and I think it's better to do it at the time you're doing the data entry (especially since there is often a lag between when data are entered by students and checked/loaded by staff).

For data entry/bulkloading, we need a way to add both IDs with the determiner, date, nature, and confidence level, PLUS indicate which is the accepted ID. Can we address the data entry issue along with this issue?