ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
57 stars 13 forks source link

nature of ID #2170

Closed dustymc closed 4 years ago

dustymc commented 4 years ago

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID is a mess.

Ideally I think we should say something about the evidence used for the ID, but that doesn't seem possible. Minimally we can not say the same thing a bunch of different ways?

NATURE_OF_ID Documentation Hu?
ID of kin An identification based upon the identification of another related individual, often the mother of an embryo. Such a specimen should have at least one individual relationship. spiffy
ID to species group Within a genus, some groups of closely related species are referred to by the species name of one widespread or well known species within the group. Hu? This (and much more) can be done with eg, Sorex {Sorex cinereus complex}, I don't think we need multiple ways of doing that.
curatorial An identification determined by qualified personnel assisting with collection management including collection managers, curators, trained students, staff and others who may not be experts in the group in question but have some knowledge of relevant taxonomy. this looks functionally identical to student
erroneous citation The specimen has been cited in refereed scientific literature by this name but this name is clearly wrong. This situation arises mostly from typographical errors in catalog numbers. spiffy - or not, but it happens
expert The determiner is a person recognized by other experts working with the taxa in question, or the regional biota. this looks functionally identical to student, or at least heavily overused. One agent is an expert on 7112 different taxa! Users have the tools to decide who they consider experts and act on that.
field A determination made without access to specialized equipment or references. "Looks like a moose" may be necessary, but I'm still not sure how it's not yet another version of 'student.'
geographic distribution Specimen is assumed, on the basis of known geographic ranges, to be the species or subspecies expected at the collecting locality. The specimen has not been identified to species or subspecies by comparing it to other subspecies within the genus or species. "It's probably that species because that species lives there and we know what species lives there because we're museums and telling people where stuff lives is what we do....." still looks circular to me
legacy The identification has been transposed from an earlier version of data that did not include identification metadata. In this case the date of the determination is the date that the data were transposed, and the determiner is unknown. I think we're stuck with the concept, but maybe "unknown" is a better label
molecular data An identification made by a laboratory analysis comparing the specimen to related taxa by molecular criteria, generally DNA sequences. yay us!
photograph "Field ID" or perhaps "morphology" is probably always more important than this. "student" version 5
published referral The specimen has been specifically determined to be of a particular taxon in a publication that describes or re-describes that taxon, but the specimen has no type status. Such a specimen record should include a citation, and the determiner(s) of record should be among the authors of the publication. (This means nothing.) I still have no idea what this means
revised taxonomy This designation is appropriate only in the presence of an earlier identification. It implies that the specimen has not been reexamined, and only that a different taxonomic name is being applied. In most cases this results from taxonomic synonymization of names. we're stuck with this
student Specimen has been identified by a person using appropriate references, knowledge, and/or and tools, but not by an expert. This is a broad use of the term student. I think maybe everyone hates the label, but the concept seems accurate for the vast majority of our identifications
type specimen This particular specimen has been described in the literature by this name. The specimen record should contain a citation of the appropriate literature,and the determiner(s) of record should be among the authors of the publication. yay us!

@ccicero

Jegelewicz commented 4 years ago

We've talked about this - #1093 maybe we can start there.

campmlc commented 4 years ago

We've already gone through the "student" discussion in https://github.com/ArctosDB/arctos/issues/1093. Let's not go there again. Reviving another related request from the earlier thread: 4) Dusty, please get rid of "ID of kin" as default nature of ID; leave blank value and require selection from list. … https://github.com/ArctosDB/arctos/issues/1093#

On Tue, Jul 16, 2019 at 3:34 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

We've talked about this - #1093 https://github.com/ArctosDB/arctos/issues/1093 maybe we can start there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBCHILW7P52LDNPLCJTP7Y5FNA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2CHFUQ#issuecomment-511996626, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBESUSF5LIIOQC7UY63P7Y5FNANCNFSM4IEIGLOA .

DerekSikes commented 4 years ago

I agree strongly that we should remove ID of Kin as the default value - too many errors result. Leave it blank & require it be filled.

student vs expert vs curatorial - I sometimes have a hard time deciding between my own IDs - are they expert or student? The addition of curatorial as an option makes this decision making even worse. Do I use expert or curatorial (or student)? I've argued before and will again that the least ambiguous set of choices for this would be 'expert' vs 'non-expert'. Non-expert is the clear choice for any IDs not made by an expert and knowing whether it was made by an expert or not is the most important thing to know about an id. If someone told me these IDs were made by students and these others by curatorial staff I would wonder about my own curatorial staff, who are often students... and myself.. and I consider myself a student of many taxa but an expert in just a few. Also, I'm an expert at getting ALL insects ID'd to order, but only an expert at getting some insects ID'd to species.

So I argue for replacing both 'student' and 'curatorial' with 'non-expert'

I agree that 'field' is ambiguous - if an expert is in the field and is confident of what the ID is without recourse to keys then use 'expert' instead. If it's not an expert, use 'non-expert'. Add remarks to the remarks if desired.

I've never used 'ID to species group' and agree with Dusty that we have better ways to show that - plus we still would want to know if it was an expert who made that ID to species group and this option prevents us from storing that information.

Geographic distribution - climate change is making some geographic regions more tolerable than they had been previously.... perhaps this should be restricted to remarks. Eg. an expert who decides what it is and then adds this information to the remarks is a more valuable ID than just using 'geographic distribution' as if anyone (non expert vs expert) could safely make an identification of the same reliability.

I think photograph should be in remarks also - an expert ID based on a photo is a lot different than a non-expert ID based on a photo.

published referral- oftentimes we don't know who did the IDs of the species listed in papers (sadly) and it's not safe to assume it was one or more of the authors. This is similar to legacy but makes it clear the ID came from a publication, not a label on a specimen in front of you.

revised taxonomy - this is important to record but I dislike that it hides information about expert vs non-expert ID... for example, if I loan specimens to an expert and get them back and later one of the species is moved to a new genus I update all the records with the new name and myself as the determiner and use revised taxonomy. But when these data are shared to aggregators or downloaded by users, they rarely or don't see that the specimens were ID'd by an expert... they only see the most recent ID by myself, a non-expert, who didn't even look at the specimens, I just updated the name.

I've long wished this could be altered somehow - perhaps needing new fields to store information about this while retaining the ID of the last person who actually looked at the specimens... somehow.

my 2 cents.

-Derek

On Tue, Jul 16, 2019 at 1:48 PM Mariel Campbell notifications@github.com wrote:

We've already gone through the "student" discussion in https://github.com/ArctosDB/arctos/issues/1093. Let's not go there again. Reviving another related request from the earlier thread: 4) Dusty, please get rid of "ID of kin" as default nature of ID; leave blank value and require selection from list. … https://github.com/ArctosDB/arctos/issues/1093#

On Tue, Jul 16, 2019 at 3:34 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

We've talked about this - #1093 https://github.com/ArctosDB/arctos/issues/1093 maybe we can start there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBCHILW7P52LDNPLCJTP7Y5FNA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2CHFUQ#issuecomment-511996626 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADQ7JBESUSF5LIIOQC7UY63P7Y5FNANCNFSM4IEIGLOA

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM7DHT2UIOFAKT35FW3P7Y6ZPA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2CIHVI#issuecomment-512000981, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUMYXR4HGVLAQUAZHO33P7Y6ZPANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

Jegelewicz commented 4 years ago

I'll say what I said before:

I don't believe that we are qualified to make an assessment of the expertise of any identifier. Wouldn't it be better to only provide objective data? Most of the options are pretty much that.

Most of these designations are opinions as Derek demonstrated with his "I am an expert, I am a student" argument. The rest would be supported with a remark or citation. If the ID was made in a publication (revised taxonomy), then there should be a citation - no need to say any more. I feel like this field is unnecessary in a way and I don't see an equivalent DWC Identification term.

Jegelewicz commented 4 years ago

for example, if I loan specimens to an expert and get them back and later one of the species is moved to a new genus I update all the records with the new name and myself as the determiner and use revised taxonomy. But when these data are shared to aggregators or downloaded by users, they rarely or don't see that the specimens were ID'd by an expert... they only see the most recent ID by myself, a non-expert, who didn't even look at the specimens, I just updated the name.

I've long wished this could be altered somehow - perhaps needing new fields to store information about this while retaining the ID of the last person who actually looked at the specimens... somehow.

This is an aggregator issue. Anyone looking at the record in Arctos could see what happened. Until GBIF&Co can handle the multiple IDs, I'm not sure we can do anything to make things look better on their end. I suggest you open an issue with iDigBio about it and start a wider discussion.

dustymc commented 4 years ago

I knew this was eerily familiar...

Looks like we're more or less agreed on 'legacy'-->'unknown' - I'll make that change if nobody stops me soonish.

remove ID of Kin

That sort of thing gets lost in these kind of Issues. I moved it to https://github.com/ArctosDB/arctos/issues/1868, which I think I'm just going to have to do in production, which is probably going to be disruptive. I re-prioritized and will try to sneak it in ASAP.

I don't believe that we are qualified to make an assessment of the expertise of any identifier. Wouldn't it be better to only provide objective data?

YES!

I don't think we're going to convince anyone that a dedicated volunteer necessarily produces lower-quality IDs than random 'students,' or that becoming a grad student magically makes someone an "expert" for everything, everywhere, using any technique (and I can't see any other way to interpret some of the data in Arctos).

Conversely, about anyone could do a bang-up job of feeding a sequence to genbank and recording whatever flops out.

Geographic distribution - climate change

Excellent point.

as if anyone (non expert vs expert) could safely make an identification of the same reliability.

That's my reading of the current documentation - "looks like a woodpecker - we're HERE - must be THAT woodpecker...."

I consider myself a student of many taxa but an expert in just a few.

That I think is getting close to useful, but I also suspect others working with those taxa know that and don't need to be told. That is, technique+agent gets at "expertness" (or the lack thereof), and we don't need an explicit pigeonhole.

There's also some weird "experience factor" in there. The first ID you make with some new technique/taxa/whatever probably isn't as good as your 10,000th. I have no idea what to do with that....

non-expert

That makes sense only if we're contrasting it with "expert," and I obviously see dubious value in "expert."

argue for replacing both 'student' and 'curatorial' with 'non-expert'

I'm not crazy about the vocabulary, but completely agree with the direction.

published referral

Fair enough, and I suppose that more or less reflects the structure of citations. Minimally they didn't reject the possibility that the thing is whatever they called it.

I think that's another reason to hate "expert" (or ranking identifiers in general) as well - I'd guess a fair number of these get entered as "expert" (they're publishing on the taxa, after all), if we're using the terms arbitrarily then they can't really DO STUFF. Maybe that's just a matter of documentation.

revised taxonomy

Interesting point. I might be tempted to bring the agent and technique (if we had such a thing...) over from the previous ID, which would preserve that, but there may be some unjustified assumptions in that too - eg, maybe the 'expert' DOES NOT think "Myodes" is just another way of spelling "Clethrionomys" and your curatorial assumptions about the relationship between those names add or subtract something from the original. That would also remove the fact that the revision is your assumption. You may be right in that we're missing something, but I can't quite see what.

ccicero commented 4 years ago

I have not read this whole thread in detail, but I think we are confounding who is making the ID (expert, student, etc.) with the basis for the ID.

Can we make this more about the basis for the Identification:

etc. (I'd need to think about this for every existing value, and there may be new ways we're not thinking about).

The 'who' part (expert or non-expert) should be through the agent making the determination. e.g., agent Peter Pyle (=expert) made identification based on phenotype on a specific date.

dustymc commented 4 years ago

Thanks Carla - I like this direction, I just didn't know anyone else did!

From your comments, here's a first pass at mapping. If we go here we could probably do better for many of the unknowns - eg, IDs by {person} or before {date} or ... could get "upgraded" to something more specific.

old new Hu?
ID of kin ID of kin secondary; see below
ID to species group unknown
curatorial unknown
erroneous citation unknown This belongs in https://arctos.database.museum/info/ctDocumentation.cfm?table=CTCITATION_TYPE_STATUS, not sure what it's doing here too??
expert unknown
field phenotype
geographic distribution geographic distribution secondary; see below
legacy unknown
molecular data genotype I don't really care what the terminology is, but we should strive for consistency if possible. (Maybe nongenetic molecules are used in IDs??)
photograph phenotype
published referral published referral?? Ideally these would come from the nature of the publication, but that could require multiple values. I suppose as long as the list is short "phenotype; genotype" isn't THAT evil
revised taxonomy revised taxonomy secondary; see below
student unknown Maybe these are all "phenotype"??

ID of kin, geographic distribution, and revised taxonomy are "dependent" or secondary IDs; you need to see the mother's ID to know anything about how much you should trust the embryo's, a taxonomic revision is basically just a search term and the important data is in a previous identification, and geographic distribution is (from my perspective) pretty close to useless by itself, but I can definitely see how it adds value as a 'suggestion' on top of that 'Peter Pyle (=expert) made identification based on phenotype on a specific date' "primary" ID.

I'm still not crazy about the idea of identifications whose validity hinges on the accuracy of locality data.

I'm not sure how to model that (something about "accepted" getting more complex, probably) and I'm certainly not sure how to ship it to GBIF, but it feels like there's some correct in there somewhere....

Some sort of 'accepted, but not all you need to know' "acceptedness" might also take care of the "these 12 techniques all say Bla blah" that might otherwise get lumped into "published referral."

DerekSikes commented 4 years ago

Seems the tide is shifting against my interest in maintaining 'expert' as an option. After some thought I realized I've only used that field for a search on 'expert' once.

I wanted to see how many Identifications I had made as an 'expert' vs how many I had made as a 'student'. I used this information in my promotion and tenure file. There were lots in both categories but the ones as expert were of course more valuable. It would be a terrible loss of information to not be able to distinguish between these two types of identifications. I would be unable to perform this search just using taxa I know well (and it would take separate searches on all the different taxa....) because there are too many to remember and it depends on how well written were the keys I was using etc. I'm a beetle specialist but for some beetle groups I can't perform expert IDs while for others I can. When I enter the information on the ID into Arctos I assess my confidence in the ID at that point. It's valuable data.

These terms are vague yes, but they are an attempt at categorizing confidence in the reliability of the ID. ( I wouldn't want to have to rank our confidence in an ID on a 10 point scale, that would be horrible). Perhaps we need a new field for this, since it is different information than HOW the ID was made.

Perhaps we need a field named "ID confidence" and all those marked with method = expert now should get "ID confidence = high" and all those marked with method = student (or curatorial) should get "ID confidence = not high" .

Also, since we have the method of 'molecular' it's safe to assume that all IDs that were made by 'expert' or 'student' could be replaced by 'phenotype' rather than 'unknown'. If someone wasn't using phenotype they'd have specified what they were using. Phenotype is the default.

-Derek

On Tue, Jul 16, 2019 at 9:34 PM dustymc notifications@github.com wrote:

Thanks Carla - I like this direction, I just didn't know anyone else did!

From your comments, here's a first pass at mapping. If we go here we could probably do better for many of the unknowns - eg, IDs by {person} or before {date} or ... could get "upgraded" to something more specific. old new Hu? ID of kin ID of kin secondary; see below ID to species group unknown curatorial unknown erroneous citation unknown This belongs in https://arctos.database.museum/info/ctDocumentation.cfm?table=CTCITATION_TYPE_STATUS, not sure what it's doing here too?? expert unknown field phenotype geographic distribution geographic distribution secondary; see below legacy unknown molecular data genotype I don't really care what the terminology is, but we should strive for consistency if possible. (Maybe nongenetic molecules are used in IDs??) photograph phenotype published referral published referral?? Ideally these would come from the nature of the publication, but that could require multiple values. I suppose as long as the list is short "phenotype; genotype" isn't THAT evil revised taxonomy revised taxonomy secondary; see below student unknown Maybe these are all "phenotype"??

ID of kin, geographic distribution, and revised taxonomy are "dependent" or secondary IDs; you need to see the mother's ID to know anything about how much you should trust the embryo's, a taxonomic revision is basically just a search term and the important data is in a previous identification, and geographic distribution is (from my perspective) pretty close to useless by itself, but I can definitely see how it adds value as a 'suggestion' on top of that 'Peter Pyle (=expert) made identification based on phenotype on a specific date' "primary" ID.

I'm still not crazy about the idea of identifications whose validity hinges on the accuracy of locality data.

I'm not sure how to model that (something about "accepted" getting more complex, probably) and I'm certainly not sure how to ship it to GBIF, but it feels like there's some correct in there somewhere....

Some sort of 'accepted, but not all you need to know' "acceptedness" might also take care of the "these 12 techniques all say Bla blah" that might otherwise get lumped into "published referral."

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM56SXPES7PLLCL3XLDP72VNBA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DCTCY#issuecomment-512108939, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM2C4GUVJF22NEQ54E3P72VNBANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

@DerekSikes I don't think I see a big-picture problem with that.

I just don't like confounding those concepts in one field. If I'm a geneticist looking for cryptic species, I probably only want things IDed using some sort of decent non-genetic technique, and our current data don't consistently support that. The ability to add "[not]confidently IDed" to that just seems like a bonus.

(I also don't much trust those data, but maybe that's another issue.)

And on that note, it probably takes us back to some sort of vocabulary/definition nightmare, but I do think there's some distinction somewhere between "phenotype, as in microscopes and slides and trait matrices and such" and "phenotype, as in it just looked like a badger." Maybe we do need to retain (and better define) "field" or something like it.

I'm not sure I understand this:

wouldn't want to have to rank our confidence in an ID on a 10 point scale, that would be horrible

anything using expert/student (and synonyms) is basically ranking confidence on a 2-point scale, no? I'm not sure how many points are in between "I just bought a field guide in the gift shop" and "I am describing the holotype" but it does seem like some sort of confidence scale.

No problem to adjust the migration path - I'll be asking lots of questions if we get to that point.

Jegelewicz commented 4 years ago

@DerekSikes why does it matter if you were an expert or a student - effort was involved either way. Is every ID I made less valuable because it was a student ID? It may have been an excellent addition to my knowledge base to make those IDs and some ID is better than "Animalia", correct? Never mind that you were the one deciding whether you were an expert or student? That seems like you could rig your own promotion...not that I'm accusing you of that, but just trying to make the point that "expert" and "non-expert" are subjective and one person's expert will be another person's non-expert. These kinds of criteria seem a poor way to determine a promotion and I am not in favor of Arctos becoming the arbiter of people's employment path.

DerekSikes commented 4 years ago

I'm not sure I understand this:

wouldn't want to have to rank our confidence in an ID on a 10 point

scale, that would be horrible

anything using expert/student (and synonyms) is basically ranking confidence on a 2-point scale, no?

Yes, a 2- point scale is reasonable. Even deciding how to rank these on a 3-point scale would be hard and hardly reproducible, even for the same person at different times of the day! a 10 point scale would be nuts.

-Derek

On Wed, Jul 17, 2019 at 7:32 AM dustymc notifications@github.com wrote:

@DerekSikes https://github.com/DerekSikes I don't think I see a big-picture problem with that.

  • this specimen was identified using phenotype
  • the person doing the ID, at the time they were doing the ID, had {"experience factor"/confidence/whatever}

I just don't like confounding those concepts in one field. If I'm a geneticist looking for cryptic species, I probably only want things IDed using some sort of decent non-genetic technique, and our current data don't consistently support that. The ability to add "[not]confidently IDed" to that just seems like a bonus.

(I also don't much trust those data, but maybe that's another issue.)

And on that note, it probably takes us back to some sort of vocabulary/definition nightmare, but I do think there's some distinction somewhere between "phenotype, as in microscopes and slides and trait matrices and such" and "phenotype, as in it just looked like a badger." Maybe we do need to retain (and better define) "field" or something like it.

I'm not sure I understand this:

wouldn't want to have to rank our confidence in an ID on a 10 point scale, that would be horrible

anything using expert/student (and synonyms) is basically ranking confidence on a 2-point scale, no? I'm not sure how many points are in between "I just bought a field guide in the gift shop" and "I am describing the holotype" but it does seem like some sort of confidence scale.

No problem to adjust the migration path - I'll be asking lots of questions if we get to that point.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM5VXINS44AANLJRYEDP743Q3A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EZFCY#issuecomment-512332427, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM67A2Q2FJV5SKP7HZDP743Q3ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

I think we're also drastically under-using id_sensu (fkey-->publication); it could clarify some of the ambiguity in "expertness" (or serve as an independent measure of "expertness") for anyone who wants to dig, along with the usual purpose of clarifying the taxon concept.

Jegelewicz commented 4 years ago

Man, you're good at a lot of stuff!! :-)

dustymc commented 4 years ago

Don't distract me, I'm still countin' legs.

Jegelewicz commented 4 years ago

BTW, I retract my comment about DWC - it seems like we are trying to fill identificationVerificationStatus with this. This standard is defined as : Comments or notes about the Identification. Pretty sure no one is going to know what most of our current terms mean and also pretty sure most people will know what the terms in Carla's list do!

2160

DerekSikes commented 4 years ago

I underuse that because to use it properly I'd have to have all the keys I use already loaded into Arctos publications to link to & this would be a massive data entry project unto itself.

Love the idea, but it fails due to cost of getting content in to make the idea work.

-D

On Wed, Jul 17, 2019 at 8:48 AM dustymc notifications@github.com wrote:

I think we're also drastically under-using id_sensu (fkey-->publication); it could clarify some of the ambiguity in "expertness" (or serve as an independent measure of "expertness") for anyone who wants to dig, along with the usual purpose of clarifying the taxon concept.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMYLLQAOCQQMZNQF5I3P75EMPA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2FAQ3Q#issuecomment-512362606, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM3YGQNELJGLKJ5SIJ3P75EMPANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

Jegelewicz commented 4 years ago

to use it properly I'd have to have all the keys I use already loaded into Arctos publications to link to & this would be a massive data entry project unto itself.

Maybe we can automate this a bit? See #2176

ccicero commented 4 years ago

I reiterate that I think we can/should get to the 'expert' vs 'student' etc issue via agents. ID is made by a certain person on a certain date. The person could be a student at one point in time, a post-doc (more of an expert) at another time, and a curator (=expert) at a third time. Also, I would say that undergraduate student and graduate student have different levels of expertise, and that doesn't come in the 'student' value for ID.

We should separate the how from the who. This would involve expanding on agents and adding agent role over time to create history (now role is just in remarks or maybe address): agent X1 = graduate student between dates A and B agent X2 = postdoc between dates C and D agent X3= curator between dates E and F

then apply one of those three agent values to the determination.

???

Jegelewicz commented 4 years ago

I would say that undergraduate student and graduate student have different levels of expertise

Still a matter of opinion and varies by the student involved. I really dislike adding our opinions. Facts are good - the position of the agent at the time of ID seems like fair information to pass along. Users can decide what that means (my grad student may be more of an "expert" than someone else's tenured faculty depending on the organism involved....).

dustymc commented 4 years ago

If someone does something that does not suck with publications (or taxonomy, or agents, or...) it should be trivial to plug in, in any capacity from making something trivial better to just replacing our model. Not holding my breath...

If the publications have DOIs, it should be fairly easy to sprinkle the create form around. (I doubt these do??) If the authors submit ORCID with the publication and to Arctos I could magic the author part, which is about 90% of the current work in creating a publication. (Even less likely, I'm sure.)

Unrelatedish, if one publication here is too much work then maintaining a taxon concepts model does not seem remotely realistic. This is not the first time that idea has surfaced, or at least caused some ripples.

The intent of "student" has always been 'one willing to learn,' and the documentation has always supported that.

@ccicero I don't think this is something that can be normalized to people. If I bring you a North American bird, I'm pretty much just going to believe whatever you tell me. If I bring you a deepwater coral larvae - well, maybe not so much. (Nevermind, I think I figured it out below...)

I think the really critical thing here is separating technique. That's going to open up a whole new world of ways to query the data, let people get at really fundamental things - do genetics and morphology support the same hypotheses? - that just aren't accessible in our data currently.

If there's some value in ranking the confidence of the ID, I can't see how it detracts from anything. I don't think I'd ever use or believe it, but I also don't think it'd ever prevent me from doing anything so I'm fine with it.

Actually I might use and believe it.

And on that note, it probably takes us back to some sort of vocabulary/definition nightmare, but I do think there's some distinction somewhere between "phenotype, as in microscopes and slides and trait matrices and such" and "phenotype, as in it just looked like a badger." Maybe we do need to retain (and better define) "field" or something like it.

The "confidence score" could be a useful way to get at that - "Hi I'm me and despite whatever my agent-stuff might imply, you shouldn't trust me very much on this particular ID."

https://github.com/ArctosDB/arctos/issues/1873 might be the best way to organize your data - it would require a few more relationships ("graduate student of" etc.). Certainly nothing above prevents those agent-data-including queries. And including the taxa used in the IDs brings the agent's 'expertness' into perspective....and AHA!, maybe, I think.

So Carla+undergrad+Pipilo finds birds IDed between certain dates (defined in agent-data) by a certain agent - essentially a dynamic "trust factor."

So Carla+postdoc+Pipilo [+phenology + whatever else is relevant] finds birds a different set of birds (unless there are re-ids) under a different "trust factor" (still dynamically defined by the user at that moment).

So if we add one more field ("confidence score"?? Need a name) and a categorical code table (with 2 categories to start with) to feed it, I think everybody's happy and this is just a matter of vocabulary and eventually UI to hook deeper into agents.

Yay us?!?

ccicero commented 4 years ago

I think we are converging on something. I agree @Jegelewicz about sticking to facts. By adding a role explicitly to agents (undergraduate student, graduate student, postdoc, curator, etc.), it gets at the status of a person during a certain tenure without saying anything about confidence in that person.

I'm not sure I like 'confidence score' - that seems pretty subjective. Leave that to the person using the data. Cicero+Pipilo+audio (audio, video should be other categories of 'how') would have fairly high confidence, but the person using the data could/should interpret based on knowing something about my research rather than us providing that info.

--> I think the really critical thing here is separating technique. That's going to open up a whole new world of ways to query the data, let people get at really fundamental things - do genetics and morphology support the same hypotheses? - that just aren't accessible in our data currently.

This is what I'm getting at. We should work on coming up with a vocabulary for technique, and that should be separate from information about the person and his/her role/status/capacity. We'd need to be able to apply >1 technique to an ID.

DNA sequence genomics morphology coloration microscopy audio video etc.

dustymc commented 4 years ago

By adding a role explicitly to agents

Minor point: adding a role to RELATIONSHIPS (which are already scheduled to get a date-component). That'll also let me get at WHERE you're a student-or-whatever, so I can add a little extra statistical weight to graduates of Dusty's Institoot o hi-er learnin'...

'confidence score' - that seems pretty subjective.

Yes, I believe that's the point.

(audio, video should be other categories of 'how'

I'm not quite sure that's correct - it's more "behavior" or perhaps "song" - like 'photo' it's the information, not the media, that we care about - but yes, I noticed that gap as well.

So "Cicero+Pipilo+audio" is the "baseline possible."

You can ignore the 'confidence score' (when searching and creating IDs) for most stuff.

If you're trying to ID a bird off of a damaged tape and you just can't quite hear enough to be sure or something, you can use 'confidence score' as a searchable way to flag that particular ID. (And I can use your agent info to not only get at how much I trust your IDs, but also how much I trust your evaluation of your IDs!)

Minimally, I don't think it can detract from anything else, and it seems that some of us have a use case for it. Ideally, perhaps it does add some real less-subjective value in some specific situations.

We'd need to be able to apply >1 technique to an ID.

That's probably a simpler approach than my vague ideas regarding "secondary" or "dependent" IDs. (And the "cool toys" comment in https://github.com/ArctosDB/internal/issues/27#issuecomment-512071298 might make it trivial without mucking with the structure - not 100% sure of that.)

Would this be one term/list, or perhaps a general+specific split - eg,

morphology; coloration; microscopy

or

general: morphology specific: coloration; microscopy

??

DerekSikes commented 4 years ago

I'm very much in favor of a field for 'confidence score.'

In entomology we make a big deal about identifications. I try to train my students to work within their skill set - don't try to ID something to species until you've shown that you can reliably ID at higher levels first (orders & families).

A 2-category confidence score gives one a way of saying "high confidence = my reputation is on the line, if I got this wrong I'd be embarrassed and suffer reputational damage. Thus I will use this only when I'm really confident."

"not-high confidence = I did my best but I'm not an expert on this group and if I got it wrong at least I warned you I wasn't that confident in the ID. No reputational damage suffered."

Granted, for older collections where most of the databasing is being done of specimens ID'd in the past & not by the curators/CM staff themselves, this is harder to use if you don't know the people and their reputations for expertise with certain taxa. However, in UAM entomology we make 40+ loans a year of tens of thousands of specimens and we loan them to people doing taxonomic work on the groups who know these taxa better or as well as anyone else alive. When these specimens come back from loan I mark their IDs as 'expert' - this is useful because in 100 years folks might have a hard time working out who was an expert on which taxa when... etc.

So I'm in favor of 'confidence score' with 2 categories. Deciding on the names for these 2 categories is tricky though. I like 'expert' and 'non-expert' which is more obvious and means different things than 'high' vs 'not-high' because people would interpret 'not-high' as synonymous with 'low-confidence' even though that's not a safe assumption, it'll still happen. Or we could use numbers 1 vs 0 (but I prefer 'expert' and 'non-expert'

Also, I reiterate that all IDs by experts and students would be method = phenotype since that's the default method, if molecular or other methods had been used, they'd have been specified.

-Derek

On Thu, Jul 18, 2019 at 8:08 AM dustymc notifications@github.com wrote:

By adding a role explicitly to agents

Minor point: adding a role to RELATIONSHIPS (which are already scheduled to get a date-component). That'll also let me get at WHERE you're a student-or-whatever, so I can add a little extra statistical weight to graduates of Dusty's Institoot o hi-er learnin'...

'confidence score' - that seems pretty subjective.

Yes, I believe that's the point.

(audio, video should be other categories of 'how'

I'm not quite sure that's correct - it's more "behavior" or perhaps "song"

  • like 'photo' it's the information, not the media, that we care about - but yes, I noticed that gap as well.

So "Cicero+Pipilo+audio" is the "baseline possible."

You can ignore the 'confidence score' (when searching and creating IDs) for most stuff.

If you're trying to ID a bird off of a damaged tape and you just can't quite hear enough to be sure or something, you can use 'confidence score' as a searchable way to flag that particular ID. (And I can use your agent info to not only get at how much I trust your IDs, but also how much I trust your evaluation of your IDs!)

Minimally, I don't think it can detract from anything else, and it seems that some of us have a use case for it. Ideally, perhaps it does add some real less-subjective value in some specific situations.

We'd need to be able to apply >1 technique to an ID.

That's probably a simpler approach than my vague ideas regarding "secondary" or "dependent" IDs. (And the "cool toys" comment in ArctosDB/internal#27 (comment) https://github.com/ArctosDB/internal/issues/27#issuecomment-512071298 might make it trivial without mucking with the structure - not 100% sure of that.)

Would this be one term/list, or perhaps a general+specific split - eg,

morphology; coloration; microscopy

or

general: morphology specific: coloration; microscopy

??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM7R6NT4ZEMCBGIOBCTQACIPNA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I7NKI#issuecomment-512882345, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM6BXZPARIGLYXZ63BTQACIPNANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

Confidence score seems different than 'expert' versus 'non-expert' - even an expert can have a less then perfect confidence in an ID. What about something like 'certain' and 'uncertain' or something like that. Also again, 'expert' and 'non-expert' is really related to the person = agent.

---> Would this be one term/list, or perhaps a general+specific split

I'd say a list of terms, apply all that are relevant and then concatenate into a single field?

dustymc commented 4 years ago

I like 'certain' and 'uncertain' - I'm no expert, but I'm also fairly sure I've never misidentified a moose or walrus.... That seems to jive with Derek's use case as well.

concatenate into a single field

I think we can make that work, and if we go to https://github.com/ArctosDB/internal/issues/27#issuecomment-512071298 we can treat it as a data object (array) instead of a concatenation. (We can do that in Oracle too, but it involves creating a datatype/it's a little more complicated.)

I see no obstacles to implementation at this point, other than vocabulary.

I suggest "nature of ID" (I think we should retain that vocabulary??) remain NOT NULL with an "unknown" (=we're not ignoring this because we can, we don't know) option, and 'confidence' (??) be NULL; most of us are just not going to have that information much of the time.

@DerekSikes for migration purposes I can be as specific as the data allow - "all IDs by experts and students would be method = phenotype" isn't a problem as long as you can somehow identify "experts and students," and we can take another swing if we miss something. Going forward there would be no defaults - how you use this (and if you use it, in the case of NULLable fields) would be left to the collections.

SO - vocabulary time?

Here's Carla's list + the current vocabulary I think we need to retain and some comments.

DerekSikes commented 4 years ago

Yes, this is a qualification on the identification itself, not on the agent. As I've said, I'm an expert only on some taxa, not all, thus I use 'expert' ID only for those for which I claim expertise.

I've not argued we should be using this term to qualify the agents. I think that's a rabbit hole of little value. These data (expert vs student) for Identifications are just ways to state how reliable the IDs are.

If it were a field for the agent it would be in the agent table.

-D

On Thu, Jul 18, 2019 at 3:03 PM Carla Cicero notifications@github.com wrote:

Confidence score seems different than 'expert' versus 'non-expert' - even an expert can have a less then perfect confidence in an ID. What about something like 'certain' and 'uncertain' or something like that. Also again, 'expert' and 'non-expert' is really related to the person = agent.

---> Would this be one term/list, or perhaps a general+specific split

I'd say a list of terms, apply all that are relevant and then concatenate into a single field?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM5DBS4A3N5Q4M5DZDDQADZDXA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KCKHA#issuecomment-513025308, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM2AHTX6TJXECBNYHT3QADZDXANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

It would be helpful to create a google doc for vocabulary and definitions to aid in migration/documentation? Can we do that please?

dustymc commented 4 years ago

google doc

On it....

DerekSikes commented 4 years ago

I don't like 'certain' vs 'uncertain' because certain means 100% whereas 'high confidence' means > ~90%

-D

On Thu, Jul 18, 2019 at 3:54 PM dustymc notifications@github.com wrote:

google doc

On it....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM546XWNNRJGJICROKDQAD7B3A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KFDDI#issuecomment-513036685, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM2WKUUGQMGLDW74DVLQAD7B3ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

https://docs.google.com/spreadsheets/d/1JfbcpVYTK73DKRkgJQ0jNrFziS9zcbx7lEq872WrfYs/edit#gid=115578899

There are two tabs - 'migration path' will remain very waffly until 'ctnature_of_id' is solidified, and will only come into play after we've dealt with whatever needs dealt with by itself (Derek's "all IDs by experts and students would be method = phenotype" and similar).

Are we agreed on "confidence" for the new field?

I'm fine with "[high|low] confidence" for the terminology, but I'm also fine with absolutely anything else that gets the idea across. Like all Arctos vocabulary, the term means precisely what we define it to mean, and using it in any way outside the definition we assign is objectively wrong.

DerekSikes commented 4 years ago

re: "I'm fine with "[high|low] confidence" for the terminology, but I'm also fine with absolutely anything else that gets the idea across. Like all Arctos vocabulary, the term means precisely what we define it to mean, and using it in any way outside the definition we assign is objectively wrong."

Few users, even power users, know all the definitions (or bother to look them up) & humans are error making machines so the more intuitive and simple we make the terms the fewer errors will be introduced.

We could for example define "high confidence" as "confidence is low" and then not be surprised when all the data are a mixture of confused errors. Let's not rely on the definitions.

-Derek

On Thu, Jul 18, 2019 at 4:18 PM dustymc notifications@github.com wrote:

https://docs.google.com/spreadsheets/d/1JfbcpVYTK73DKRkgJQ0jNrFziS9zcbx7lEq872WrfYs/edit#gid=115578899

There are two tabs - 'migration path' will remain very waffly until 'ctnature_of_id' is solidified, and will only come into play after we've dealt with whatever needs dealt with by itself (Derek's "all IDs by experts and students would be method = phenotype" and similar).

Are we agreed on "confidence" for the new field?

I'm fine with "[high|low] confidence" for the terminology, but I'm also fine with absolutely anything else that gets the idea across. Like all Arctos vocabulary, the term means precisely what we define it to mean, and using it in any way outside the definition we assign is objectively wrong.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMZVBDELJH4P5PGCPY3QAEB4PA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KGICA#issuecomment-513041416, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM3SWW7FK2TEEYZQTHTQAEB4PANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

Obviously the terms should be as intuitive as possible, and no doubt the further we drift from that the more they're misused and data quality (eg, ability to talk to other things or answer deep questions) drops off. Avoiding confusing terminology is occasionally a decent reason to avoid all vocabulary and do something symbolic - things like 1-5 star reviews are built on this idea.

This will be applied to walrus (pretty hard to get wrong) and fossil pollen (magic, as far as I can tell) and ... - whatever we do should make some sense for anything that might end up in Arctos. I think I'm hearing that vague is better, and we should avoid any terms with numeric connotations.

"confident" and "less-confident" (and "SWAG"....)??

A sliding scale between 0 (this might not be a bird at all) and 100 (it can't possibly be anything else)??

?????

ccicero commented 4 years ago

--> Are we agreed on "confidence" for the new field?

Fine with me.

---> I'm fine with "[high|low] confidence" for the terminology, but I'm also fine with absolutely anything else that gets the idea across.

High/low confidence is ok with me.

ccicero commented 4 years ago

I just went through the google doc and made some edits to the nature of ID terms, and added a column with comments on what I did and reasons. I like it! Looking through this, I think it makes sense to just have one ID basis per ID; different bases for ID can be applied through ID history. I have not done anything yet with the migration path.

This brings up a related issue (can't find it but I think it already exists) for how to deal with multiple IDs in data entry - both individual form and bulkloader. When a specimen comes back from the field or is prepped as salvage, it's often ID'd using 'gross morphology' (previously 'field') but then another ID (geographic distribution ---> subspecies) is added when it's cataloged. We need a way to be able to enter both of these IDs and their respective 'nature of ID' at the time of cataloging.

I did find this old issue but that's not the one I'm thinking of. I'm pretty sure that I created an issue for it. Create a new issue?

campmlc commented 4 years ago

Agree with high low confidence. After all, IDs change, so "certain" is not appropriate.

On Fri, Jul 19, 2019 at 7:32 AM Carla Cicero notifications@github.com wrote:

--> Are we agreed on "confidence" for the new field?

Fine with me.

---> I'm fine with "[high|low] confidence" for the terminology, but I'm also fine with absolutely anything else that gets the idea across.

High/low confidence is ok with me.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBAXRNH6ZITTA4TLKFTQAG7AHA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2LUJKI#issuecomment-513229993, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBATAFX6JMZZQUCHA4DQAG7AHANCNFSM4IEIGLOA .

dustymc commented 4 years ago

@ccicero thanks! Definitions look great/I'm OK with all of that.

That issue is as good as any - it's how I'll get there. PLEASE prioritize it, we can merge if anyone ever finds the other.....

DerekSikes commented 4 years ago

I'd like to make the case again for 'high' vs 'not-high' confidence and here's why:

high = > ~90% confidence not high = < ~ 90% confidence

low = < ~50 % confidence

not high and low mean different things. Low leaves a bunch of 'not high' (medium?) confidence unavailable as an option.

-Derek

On Fri, Jul 19, 2019 at 6:17 AM dustymc notifications@github.com wrote:

@ccicero https://github.com/ccicero thanks! Definitions look great/I'm OK with all of that.

That issue is as good as any - it's how I'll get there. PLEASE prioritize it, we can merge if anyone ever finds the other.....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMZZMFAQBJZPF7THGADQAHEF7A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2LYB4Q#issuecomment-513245426, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFNUM4YMZIHZX4QV2WPLITQAHEF7ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

@DerekSikes OK, but I don't like 'not high' as a term. What about this:

term = identification_confidence

values: confident: Identification is made with 90% or higher confidence. not confident: Identification is made with less than 90% confidence.

???

ccicero commented 4 years ago

or

more confident: Identification is made with 90% or higher confidence. less confident: Identification is made with less than 90% confidence.

because 89% is still fairly confident, just not quite as good as 90%.

???

dustymc commented 4 years ago

I think I'm digging the slider idea more and more...

If I found that 10% (or ANY percent really) of the things I'd thought were walrus were in fact not, I'd just be flabbergasted. Assigning a 90% (or 99%) confidence to that is a HUGE degradation of the data; it just doesn't remotely represent the situation, which is that there are a limited number of possible taxa that present as 5000 pound carcasses with a distinctive shape, texture, smell, and dentition on the beaches of western Alaska. Identifying walrus from ancient ivory carvings is probably a bit trickier.

On overwintered shrews from just the skull, I might start buying lottery tickets if I found out I got 90% of them right, and I don't think that's a drastically different number for "experts" - the diagnostic features just aren't there. Geneticists could certainly do much better - their "diagnostic features" don't get ground away.

I assume that sort of variability is fairly common across taxa/technique/individual, and so should be available in our evaluation of ourselves.

A slider would facilitate the full range of confidence, and also get us out of this vocabulary trap. It's an intuitive approach to this sort of data, and can be implemented through a familiar UI.

UI would look something like https://arctos.database.museum/demo - that's just the one-line, out-of-the-box implementation, it can be styled in any way.

sharpphyl commented 4 years ago

Our collection may be an outlier, but we find the distinction between "legacy," "student," and "expert" to be of value and in Dusty's table, all of them become "unknown." For us, a legacy ID (40% of our collection) is whatever arrived with the specimen. (I know that's not the exact definition in the table, but it's the closest we had to chose from.) Well over half of our lots arrive with an ID from a collector or dealer. We may have no idea what the collector's confidence level was or what the basis for the ID was, but if we haven't changed it, it is an accepted taxon name. In the ID remarks, we note if the ID has been confirmed and by whom.

A student ID (36% of our collection) is made by any and all of our volunteers as none of us is a qualified expert, but we do research the specimen and have a modest degree of confidence. Only 2% of our specimens have been reviewed by a true expert.

To me, these are important distinctions and should be to a researcher as well. I hate to have the nature of the IDs of >75% of our collection become "unknown" when we do know something about the nature, timing and quality of the ID. Are we trying to replace this information with the two options in the "degree of confidence" field? That still doesn't convey the degree of knowledge of the person making the ID (if known).

Before this is implemented, can we see a visual of how the data entry "Identification" section would look and the options in the controlled fields?

dustymc commented 4 years ago

whatever arrived with the specimen

That was the intent of legacy, and now unknown - "this is what we have."

no idea what the collector's confidence level

I think this must be a NULLable concept for precisely that reason.

In the ID remarks, we note if the ID has been confirmed and by whom.

I'm sure that's common, but it's also of extremely limited utility - it's useful for someone who's already found the specimen and is willing to read the remarks. Adding an identification makes that information much more usable.

75% of our collection become "unknown"

That's just a default. If you make generally make identifications by "gross phenotype" (or whatever) then we can use that as the default for your collection, or bits-and-pieces of it, or whatever.

Are we trying to replace this information with.... confidence

Yes, but it offers a lot more precision when that's available.

two options

Agreed, more or less - that's why I'm leaning towards something like a slider.

visual

This...

Screen Shot 2019-07-19 at 8 57 30 AM

would use terminology from https://docs.google.com/spreadsheets/d/1JfbcpVYTK73DKRkgJQ0jNrFziS9zcbx7lEq872WrfYs/edit#gid=115578899, and something like

https://arctos.database.museum/demo

Screen Shot 2019-07-19 at 8 58 16 AM

or some dropdown of some terms or something would be inserted near it.

dustymc commented 4 years ago

Missing from the last reply: Once we've established some agreeable means of recording identification_confidence, we can talk about "translating" expert, student, etc. to it. This should not be a lossy process.

sharpphyl commented 4 years ago

Should the title of this field go from Nature of ID to ID Methodology?

This does still leave out what we're eliminating from this field - the nature of the person or situation in which the ID is made: student, expert, legacy, temporary, field. Can this be captured here:

we can talk about "translating" expert, student, etc. to it. This should not be a lossy process.

And thanks for the new word for the day: lossy!

Is the Confidence-o-Meter optional?

dustymc commented 4 years ago

Nature of ID to ID Methodology?

I don't think so - this better encompasses the "basic or inherent features" of the identification - eg, someone looked at morphology. Methodology can come from id_sensu, and I'm not sure there's a lighter-weight proxy to that.

the nature of the person

That's in their agent record - eg, review Carla's publications, projects, collections, etc., and you might start to get the idea that she knows something about birds. Adding dates to relationships will eventually let you get a bit deeper in that - eg, maybe Carla's IDs made when she was an undergrad should be weighted differently.

It's also not something natural to Carla, as far as I know - I'm guessing she'd not be quite as "expert" at identifying liverworts or larval jellyfish (but who knows!) - it's just part of the context needed to evaluate a specific identification.

student

This has always been ambiguous, but some sort of lower confidence than expert should get you to one of the things it might have been intended to cover.

expert

As above.

legacy

That's just another word for "unknown" in this context.

temporary

I don't understand why this might ever exist.

field

gross phenotype I think, but I've never quite understood how this was distinct from all of our other vague terms, so...

Confidence-o-Meter

Nice!

optional

yes

I think this must be a NULLable concept for precisely that reason.

ccicero commented 4 years ago

I like the idea of a slide-o-meter for confidence level (which would be nullable), rather then arbitrary and more subjective categorical divisions.

I think "Nature of ID" is descriptive - to me that describes what an ID is based on, but we've been confounding it with who it was based on (expert, student) which is more appropriate in the context of agents as Dusty described.

legacy - probably 'gross phenotype' is better than unknown (?) but good question on whether that term makes sense for cultural and paleo collections. Rather than 'gross phenotype,' what about 'gross traits' to make it more broadly applicable? A basket with specific types of plant or animal matter can be described by traits, as can a fossil bone, but phenotype doesn't seem appropriate in those cases.

ccicero commented 4 years ago

'gross features' is better than 'gross traits' - I updated the google doc accordingly, see definition.

dustymc commented 4 years ago

gross features

I like it. @AJLinn

I don't much like 'related kin'==>kin kin/related relatives.

I'm wondering if we should also push 'genomics' in the direction of something like 'chemical analysis' (not crazy about that term) to encompass things like https://physicsworld.com/a/nuclear-fallout-used-to-spot-fake-art/, or if that (when/if we get those data) should be another category?

@marecaguthrie

campmlc commented 4 years ago

What about "trait- based" instead of " gross traits". Sometimes traits are gross, but not always :)

We need "kin relationship" for things like captive bred wolves etc. None of us are confirming traits when the source is a captive breeding population.

I don't think we need to go into the details of the genetic vs genomic IDs. If we go that route, we'd have to distinguish single locus vs multilocus (and how many loci?) nuclear vs mitochondrial or some combo thereof with or without added morphological traits vs whole genome but which parts of the genome etc etc. How about just "molecular"?

On Sat, Jul 20, 2019, 9:27 AM dustymc notifications@github.com wrote:

gross features

I like it. @AJLinn https://github.com/AJLinn

I don't much like 'related kin'==>kin kin/related relatives.

I'm wondering if we should also push 'genomics' in the direction of something like 'chemical analysis' (not crazy about that term) to encompass things like https://physicsworld.com/a/nuclear-fallout-used-to-spot-fake-art/, or if that (when/if we get those data) should be another category?

@marecaguthrie https://github.com/marecaguthrie

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBBDQ4J3IU35MWUY6ALQAMVHDA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NQQIQ#issuecomment-513476642, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBATHWOKWAFDCAR6573QAMVHDANCNFSM4IEIGLOA .