ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

nature of ID #2170

Closed dustymc closed 4 years ago

dustymc commented 5 years ago

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID is a mess.

Ideally I think we should say something about the evidence used for the ID, but that doesn't seem possible. Minimally we can not say the same thing a bunch of different ways?

NATURE_OF_ID Documentation Hu?
ID of kin An identification based upon the identification of another related individual, often the mother of an embryo. Such a specimen should have at least one individual relationship. spiffy
ID to species group Within a genus, some groups of closely related species are referred to by the species name of one widespread or well known species within the group. Hu? This (and much more) can be done with eg, Sorex {Sorex cinereus complex}, I don't think we need multiple ways of doing that.
curatorial An identification determined by qualified personnel assisting with collection management including collection managers, curators, trained students, staff and others who may not be experts in the group in question but have some knowledge of relevant taxonomy. this looks functionally identical to student
erroneous citation The specimen has been cited in refereed scientific literature by this name but this name is clearly wrong. This situation arises mostly from typographical errors in catalog numbers. spiffy - or not, but it happens
expert The determiner is a person recognized by other experts working with the taxa in question, or the regional biota. this looks functionally identical to student, or at least heavily overused. One agent is an expert on 7112 different taxa! Users have the tools to decide who they consider experts and act on that.
field A determination made without access to specialized equipment or references. "Looks like a moose" may be necessary, but I'm still not sure how it's not yet another version of 'student.'
geographic distribution Specimen is assumed, on the basis of known geographic ranges, to be the species or subspecies expected at the collecting locality. The specimen has not been identified to species or subspecies by comparing it to other subspecies within the genus or species. "It's probably that species because that species lives there and we know what species lives there because we're museums and telling people where stuff lives is what we do....." still looks circular to me
legacy The identification has been transposed from an earlier version of data that did not include identification metadata. In this case the date of the determination is the date that the data were transposed, and the determiner is unknown. I think we're stuck with the concept, but maybe "unknown" is a better label
molecular data An identification made by a laboratory analysis comparing the specimen to related taxa by molecular criteria, generally DNA sequences. yay us!
photograph "Field ID" or perhaps "morphology" is probably always more important than this. "student" version 5
published referral The specimen has been specifically determined to be of a particular taxon in a publication that describes or re-describes that taxon, but the specimen has no type status. Such a specimen record should include a citation, and the determiner(s) of record should be among the authors of the publication. (This means nothing.) I still have no idea what this means
revised taxonomy This designation is appropriate only in the presence of an earlier identification. It implies that the specimen has not been reexamined, and only that a different taxonomic name is being applied. In most cases this results from taxonomic synonymization of names. we're stuck with this
student Specimen has been identified by a person using appropriate references, knowledge, and/or and tools, but not by an expert. This is a broad use of the term student. I think maybe everyone hates the label, but the concept seems accurate for the vast majority of our identifications
type specimen This particular specimen has been described in the literature by this name. The specimen record should contain a citation of the appropriate literature,and the determiner(s) of record should be among the authors of the publication. yay us!

@ccicero

dustymc commented 4 years ago

can only identify a specimen to genus, and use taxon A, then confidence = high

I think these are two very different things. "I'm sure it's that genus" and "I think it's probably that genus" are both common scenarios.

assigning 'ssp' suggests that we tried to ID to subspecies but couldn't with confidence so we didn't

That's not what I'd get from there! (I thought it just meant "there are subspecies for this species.") Again I think confounding any >1 concepts is going to lead to confusion. We now have a way to explicitly record confidence, and trying to derive it from anything else is just going to confuse most users.

In any case I could implement https://github.com/ArctosDB/arctos/issues/727 in ~a week (just needs prioritized, especially in relation to PG and the other "emergency" issues that have come up in the past few days), or we might find a way to implement https://github.com/ArctosDB/arctos/issues/2178 as part of SABI, which has the potential to allow any number of just about anything to be loaded ~simultaneously with the core record.

acdoll commented 4 years ago

This tripped me up while loading specimen through the bulkloader (both from single entries in Data Entry and from uploaded .csv). I was using 'expert' and 'field' as nature of ID, but the error reads: b_bulkload: b_bulkload: : ORA-20001: Legacy nature_of_id terms are disallowed; see https://github.com/ArctosDB/arctos/issues/2170ORA-06512: at "UAM.TEMP_TR_ID_BIU", line 4ORA-04088: error during exe {snip...} Can we remove these outdated nature_of_ID's from the dropdowns? Why are we allowed to select these options if they will just fail the validation? OR: Can we edit the error message to point to the code table instead of this issue (there's a lot to read here)?

Also, there is no 'confidence' field on the Data Entry page.

dustymc commented 4 years ago

remove these outdated nature_of_ID's from the dropdowns

Not while they're still used, and something I thought was fully resolved seems to have started over.

point to the code table

Sure - but I think it's already linked everywhere this can be used??

Data Entry

https://github.com/ArctosDB/arctos/issues/2170#issuecomment-537600903

acdoll commented 4 years ago

Thanks (and sorry for the repetition, I was skimming too quickly through this issue).

reminders[bot] commented 4 years ago

:wave: @dustymc, finalize this

ccicero commented 4 years ago

A sp. and A ssp. are the same concepts - you can identify to some level (genus or species but not to species or subspecies, respectively).

Sure, there can be different scenarios where confidence is high or medium or low at any level (confident of genus but not species, or species but not subspecies). That's my point.

Maybe this is (partly?) resolved by getting rid of 'sp' and 'ssp' (and 'aff' and 'cf' ?) and using A {string} for those IDs per issue 1304. If we do that, you'd select A for the genus or species and assign a confidence value, but then put 'sp' or 'ssp' in the string - no need to assign a confidence level to that. However, that still brings up an issue because the way the form is now, you're assigning confidence to the whole ID "A {string}" when really you want to assign confidence to A. So I'm not sure what confidence to assign for "A {string}" when it's 'sp' or 'ssp'

Let's talk about 727 at the next issues meeting. This is high priority for me, especially with the changes, because it's usually a two-step process for identifying specimens to subspecies which is typical for birds and mammals: field/features ID to species, then geographic distribution (plus features in more complicated cases) to subspecies. It would be nice to be able to create bth IDs when doing data entry, each with their own confidence.

FINALLY - regarding migration: when you do the final migration, can you also update the values for nature_of_ID that are in bulkloader files so those users don't have to worry about changing those manually!?

dustymc commented 4 years ago

final migration

Unless something drastic happens, I'll probably aim for Friday - Arctos has been really weird for me lately, hopefully that'll help mitigate any meltdowns (or at least give ya'll an excuse to quit early!). Yes I can get the bulkloader too - or I can do that now?

ccicero commented 4 years ago

I would just do the bulkloader at the same time.

Arctos has been very SLOW lately, not sure if that's what you're talking about though.

Thanks.

dustymc commented 4 years ago

OK I'll plan on that.

Yes that's what I'm talking about - neither I nor TACC can fully explain it, which is making me more paranoid than usual....

campmlc commented 4 years ago

Yes, horribly slow today! I was trying to finish a loan to get to Fedex before the 7pm deadline, and could not get barcodes to load to parts.

On Wed, Oct 16, 2019 at 5:09 PM dustymc notifications@github.com wrote:

OK I'll plan on that.

Yes that's what I'm talking about - neither I nor TACC can fully explain it, which is making me more paranoid than usual....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBCA4ULDQEPNNPY6YYTQO6NKVA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOGXIQ#issuecomment-542927778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBD43EUCF635QTGCP3DQO6NKVANCNFSM4IEIGLOA .

ccicero commented 4 years ago

Back to 'features' - Is there a functional difference between 'coarse features' and 'features' ? I can see keeping 'fine features' as separate (looked under a microscope or whatever), but 'looks like a moose' by its features seems the same as by its coarse features. What do folks think about reducing this to just 'features' and 'fine features' to make the data more consistent in how these are used?

dustymc commented 4 years ago

Here's who's using %features%


select 
guid_prefix, nature_of_id,count(*) c 
from collection,cataloged_item,identification where 
nature_of_id like '%features' and 
collection.collection_id=cataloged_item.collection_id and 
cataloged_item.collection_object_id=identification.collection_object_id 
  7  group by guid_prefix,nature_of_id order by guid_prefix,nature_of_id;

GUID_PREFIX      NATURE_OF_ID          C
-------------------- -------------------- ----------
BYU:Herp         coarse features          80
BYU:Herp         features            742
BYU:Herp         fine features       121
CHAS:Teach       coarse features          62
CHAS:Teach       fine features         1
DMNS:Bird        features             16
DMNS:Mamm        coarse features           2
DMNS:Mamm        features              8
DMNS:Mamm        fine features         3
KWP:Ento         fine features       179
MSB:Herp         features              5
MSB:Mamm         coarse features         101
MSB:Mamm         features            261
MSB:Mamm         fine features         1
MVZ:Bird         coarse features           9
MVZ:Bird         features             43
MVZ:Herp         features              1
MVZ:Herp         fine features         4
MVZ:Mamm         features              1
UAM:EH           fine features         3
UAM:Ento         coarse features          98
UAM:Ento         fine features        72
UAMObs:Ento      fine features        10
UCM:Bird         coarse features          11
UCM:Bird         fine features         1
UCM:Herp         fine features         1
UCSC:Bird        features            554
UCSC:Herp        features              1
UMNH:Mamm        features              9
UMNH:Teach       features             64
UNR:Herp         features            268
UNR:Mamm         features              4
UWYMV:Fish       features             18

33 rows selected.
campmlc commented 4 years ago

I thought we'd given up on the coarse and fine distinction and were just going with features, which is what I've been using last couple of days.

On Thu, Oct 17, 2019 at 11:22 AM Carla Cicero notifications@github.com wrote:

Back to 'features' - Is there a functional difference between 'coarse features' and 'features' ? I can see keeping 'fine features' as separate (looked under a microscope or whatever), but 'looks like a moose' by its features seems the same as by its coarse features. What do folks think about reducing this to just 'features' and 'fine features' to make the data more consistent in how these are used?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ADQ7JBELMFMXBS22GFRTLNTQPCNOFA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBQ4CDA#issuecomment-543277324, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBG5DROUNCJK4TE2553QPCNOFANCNFSM4IEIGLOA .

ccicero commented 4 years ago

Is this for new data or per migration requests?

Mariel - That was discussed and what I favored, but I think others still wanted to have more than just 'features.' Like I said, 'fine features' makes some sense. I also am just using 'features' and still think it's worth asking the question about coarse features. What is the functional difference between that and just 'features'?

Dusty - Please change the 9 MVZ:Bird records to just 'features' - Not sure if those are new records, but if they are, then it emphasizes my point about inconsistency in how the data are entered.

ccicero commented 4 years ago

Dusty - You're still planning on doing the changes tomorrow, right? Chris says that Jim P is anxious to get his records out of the bulkloader so he can curate specimens from his most recent trip. Thanks.

DerekSikes commented 4 years ago

I'm fine with just 2: 'features' and 'fine features' but the definition for 'features' should mention that this is used for cases when it was unknown if the examination was of coarse or fine features.. not just an assumption that the features were not fine.

fine, D

On Thu, Oct 17, 2019 at 9:42 AM Carla Cicero notifications@github.com wrote:

Dusty - You're still planning on doing the changes tomorrow, right? Chris says that Jim P is anxious to get his records out of the bulkloader so he can curate specimens from his most recent trip. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM3325PFIXAKCOPLNXLQPCPX3A5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBQ56OA#issuecomment-543285048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM6MAA45SXOTDOP4IKLQPCPX3ANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

change the 9 MVZ:Bird

done

changes tomorrow,

If I think I can without melting something important, yes.

out of the bulkloader

I can help with that at any time, or that should be straightforward with SQL option.

ccicero commented 4 years ago

Thanks Derek. I agree re: definition.

Dusty - Can you go ahead and make the nature of ID change to the records under username 'patton' in the bulklloader? Then he can load them. Thanks.

dustymc commented 4 years ago

patton

Done

sharpphyl commented 4 years ago

We're getting error messages when we try to update an identification on an existing record. The messages refer us to this Issue but I don't see a resolution here.

This is our first update attempt.

ID error - input

This is the error message which directs us to this issue.

Error message

The handbook references #515.

We are not trying to use "legacy" as the initial ID, so is the error message a bug?

If we select "revised taxonomy" for the new ID, we sometimes get this error message. We get whether or not we have completed the "confidence" field.

revised taxonomy

Right now, we seem locked out of updating identifications.

sharpphyl commented 4 years ago

P.S. We can get it to work with some other selections such as "audio-visual" then change it to student or revised taxonomy before we save it.

ccicero commented 4 years ago

Thanks D.

Phyllis - 'student' is a legacy value that's no longer allowed. As soon as Dusty makes the changes (hopefully tomorrow), he'll clean up the list so it only has the new values. Sorry for the inconvenience in the interim.

sharpphyl commented 4 years ago

Thanks, Carla. I can't find in this thread what "student" will be changed to (like "legacy" was changed to "unknown"). Is the new value in the list? Or is it just a confidence level? Sorry I didn't keep up with this thread.

dustymc commented 4 years ago

See http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID for terminology and https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit#gid=1160354566 for the "unless someone tells me otherwise..." mapping.

sharpphyl commented 4 years ago

OK, so "student" will become "features" at medium confidence. Thanks.

DerekSikes commented 4 years ago

For the terminology:

audio-visual

might be better as

audio-video

which would unambiguously match its definition. Currently I could see someone unfamiliar with the definitions using this as a synonym of features. Changing visual to video would prevent that.

And I think we agreed to eliminate 'coarse features' and just use 'features'

And I will reiterate that 'type specimen' should not become legacy - there are lots of such IDs that lack citations and eliminating this option would toss valuable data.

-Derek

On Thu, Oct 17, 2019 at 11:36 AM dustymc notifications@github.com wrote:

See http://arctos.database.museum/info/ctDocumentation.cfm?table=CTNATURE_OF_ID for terminology and https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit#gid=1160354566 for the "unless someone tells me otherwise..." mapping.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMZ5K72Q7YDQJSEKBKLQPC5FPA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBRIVZA#issuecomment-543328996, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM4WWH6UURQFXRDONY3QPC5FPANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

eliminate 'coarse features'

I'd just need a map for those that have used it - https://github.com/ArctosDB/arctos/issues/2170#issuecomment-543281386

type specimen

PLEASE, suggest a better mapping! https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit#gid=1160354566 (And remarks will catch anything that gets tossed.)

DerekSikes commented 4 years ago

The thing about 'type specimen' that is special and worth keeping (no need to map if it's not changing... or just map to itself) is that when a taxonomist describes a taxon and designates a type specimen they've often done a hell of a lot more than what is encompassed in id method = 'features'.

They've compared those specimens to numerous others, assessed variation across space / populations / close relatives etc. In some cases, they've used molecular methods to assess genetic uniqueness as well as 'features' and in other cases, rarely, they might have actually used ecological/ behavioral data to assess species status. They might have even done breeding trials.

Regardless of how many different methods they've used, even if they described the new species based only on 'features' there's a huge difference between using a key someone else wrote to key out an unknown using 'features' and a taxonomist describing a new species, and WRITING the key and describing the species.

Additionally, because of the rules of nomenclature, type specimens ALWAYS belong to the name - even if that name has later been deemed to not represent a distinct species the type specimen is tied to that name by the "laws" of nomenclature.

All this cannot be summarized as 'features' and thus I argue that 'type specimen' should not become legacy but should remain in our list of nature of ID options.

-Derek

On Thu, Oct 17, 2019 at 12:18 PM dustymc notifications@github.com wrote:

eliminate 'coarse features'

I'd just need a map for those that have used it - #2170 (comment) https://github.com/ArctosDB/arctos/issues/2170#issuecomment-543281386

type specimen

PLEASE, suggest a better mapping! https://docs.google.com/spreadsheets/d/14IOPiv2vHbZv30N3975Y80wf2-_1__Xy1I7ZfXI1FSM/edit#gid=1160354566 (And remarks will catch anything that gets tossed.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM6UJRRHLAWIZ3WRH4DQPDCAHA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBRMN4Y#issuecomment-543344371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM45XHHHWPS5K4ZJIDTQPDCAHANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

ccicero commented 4 years ago

I like audio-video

I also agree that 'type specimen' should remain an option in the list.

dustymc commented 4 years ago

In some cases, they've used molecular methods to assess genetic uniqueness as well as 'features' and in other cases, rarely, they might have actually used ecological/ behavioral data to assess species status. They might have even done breeding trials.

That is precisely the information I think we need to capture; if you're a whateverologist, you might want to find types defined by somethingelseology (eg, because you want to see if morphology and COI agree, or .....). Tossing all of that into "type specimen" won't support any of those questions.

A search on typestatus like %type (or something more specific) + nature_of_id=features WILL find those specimens.

If there's some value in these data, I suggest we follow MCZ's lead and create a "random types with no more information" publication to make the correct links (eg, make them discoverable along with well-documented types), even if we do it with a low-quality publication. (That probably needs a new "unknown type" or similar status??)

I checked a half-dozen of these, one of them looks like it's probably really an undocumented paratype, the rest I'm fairly certain just have random values here.

Here's a summary:


col guid_prefix format a30;
select guid_prefix, count(*) c from collection,cataloged_item,identification where 
collection.collection_id=cataloged_item.collection_id and
cataloged_item.collection_object_id=identification.collection_object_id
and identification.nature_of_id='type specimen'
and not exists (select collection_object_id from citation where citation.collection_object_id=cataloged_item.collection_object_id and type_status like '%type')
group by guid_prefix order by guid_prefix;

GUID_PREFIX             C
------------------------------ ----------
ALMNH:ES                3
DGR:Bird                6
DGR:Mamm                1
DMNS:Bird               3
DMNS:Mamm                  17
HWML:Para                1710
KNWR:Ento               9
MLZ:Bird                2
MSB:Bird                   10
MSB:Fish                8
MSB:Host                1
MSB:Mamm                 4436
MSB:Para                   44
MVZ:Bird                  130
MVZ:Herp                  518
MVZ:Mamm                 1109
MVZObs:Herp             1
UAM:Bird                  156
UAM:Ento                3
UAM:Fish                2
UAM:Herb                  218
UAM:Herp                2
UAM:Inv                 9
UAM:Mamm                 6699
UAMObs:Ento                53
UAMObs:Mamm             7
UCM:Fish                   21
UCM:Herp                  442
UCM:Mamm                   26
USNPC:Para              1
UTEP:ES                    12
UTEP:Ento                  20
UTEP:Herb                  15
UTEP:Herp                 300
UTEP:HerpOS             1
UTEP:Inv                   20
UTEP:Zoo                   23
UWBM:Herp               2

and data

create table temp_undoctypes as select guid from flat,identification where flat.collection_object_id=identification.collection_object_id
and identification.nature_of_id='type specimen'
and not exists (select collection_object_id from citation where citation.collection_object_id=flat.collection_object_id and type_status like '%type')
order by guid;

temp_undoctypes.csv.zip

DerekSikes commented 4 years ago

It should not be up to the data entry technicians to assess all the methods that were used by an author who describes a species. All they know is that it's a type specimen and thus it's important to flag it as such EVEN if the citation / publication is unknown.

-Derek

On Thu, Oct 17, 2019 at 1:00 PM dustymc notifications@github.com wrote:

In some cases, they've used molecular methods to assess genetic uniqueness as well as 'features' and in other cases, rarely, they might have actually used ecological/ behavioral data to assess species status. They might have even done breeding trials.

That is precisely the information I think we need to capture; if you're a whateverologist, you might want to find types defined by somethingelseology (eg, because you want to see if morphology and COI agree, or .....). Tossing all of that into "type specimen" won't support any of those questions.

A search on typestatus like %type (or something more specific) + nature_of_id=features WILL find those specimens.

If there's some value in these data, I suggest we follow MCZ's lead and create a "random types with no more information" publication to make the correct links (eg, make them discoverable along with well-documented types), even if we do it with a low-quality publication. (That probably needs a new "unknown type" or similar status??)

I checked a half-dozen of these, one of them looks like it's probably really an undocumented paratype, the rest I'm fairly certain just have random values here.

Here's a summary:

col guid_prefix format a30; select guid_prefix, count(*) c from collection,cataloged_item,identification where collection.collection_id=cataloged_item.collection_id and cataloged_item.collection_object_id=identification.collection_object_id and identification.nature_of_id='type specimen' and not exists (select collection_object_id from citation where citation.collection_object_id=cataloged_item.collection_object_id and type_status like '%type') group by guid_prefix order by guid_prefix;

GUID_PREFIX C


ALMNH:ES 3 DGR:Bird 6 DGR:Mamm 1 DMNS:Bird 3 DMNS:Mamm 17 HWML:Para 1710 KNWR:Ento 9 MLZ:Bird 2 MSB:Bird 10 MSB:Fish 8 MSB:Host 1 MSB:Mamm 4436 MSB:Para 44 MVZ:Bird 130 MVZ:Herp 518 MVZ:Mamm 1109 MVZObs:Herp 1 UAM:Bird 156 UAM:Ento 3 UAM:Fish 2 UAM:Herb 218 UAM:Herp 2 UAM:Inv 9 UAM:Mamm 6699 UAMObs:Ento 53 UAMObs:Mamm 7 UCM:Fish 21 UCM:Herp 442 UCM:Mamm 26 USNPC:Para 1 UTEP:ES 12 UTEP:Ento 20 UTEP:Herb 15 UTEP:Herp 300 UTEP:HerpOS 1 UTEP:Inv 20 UTEP:Zoo 23 UWBM:Herp 2

and data

create table temp_undoctypes as select guid from flat,identification where flat.collection_object_id=identification.collection_object_id and identification.nature_of_id='type specimen' and not exists (select collection_object_id from citation where citation.collection_object_id=flat.collection_object_id and type_status like '%type') order by guid;

temp_undoctypes.csv.zip https://github.com/ArctosDB/arctos/files/3741268/temp_undoctypes.csv.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM4JR6XYZK3N7KEY6T3QPDHARA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBRQIHI#issuecomment-543360029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM7FDJV5KRVR3LHQXYTQPDHARANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

important to flag it

How important? What's there (apparently-incorrect data aside) requires some knowledge of an arbitrary administrative process, at least two queries to find all the types, and displaces methodology from "nature." What I laid out works like everything else, doesn't require your techs to assess anything ('unknown' is always an option), doesn't need a specific publication, and there's a place for method-based 'nature' if it is known.

In any case I can drop this out of the migration if that's what's best.

DerekSikes commented 4 years ago

I'd argue that 'type specimen' is a method. It's a method that employs the rules of nomenclature to assert this specimen is this taxon.

Why would it require 2 queries to find all the types- because some won't have nature of id = type specimen? I can see that being a problem but it's easily fixed by setting that field to = type specimen for all type specimens. However, it'd be hard to enforce consistency though, people could use the citation / holotype option and for the nature of ID choose something besides 'type specimen'

You say that what you laid out "doesn't need a specific publication" - how then can one specify a specimen is a type without creating a citation from a publication?

Another option, and what I've done in databases I've made and used in the past, is to have a field dedicated to 'type status' and most specimens would be 'none' but all the type specimens would have something in that field like 'holotype' etc.

Is there a Darwin core field for that?

-D

On Thu, Oct 17, 2019 at 4:36 PM dustymc notifications@github.com wrote:

important to flag it

How important? What's there (apparently-incorrect data aside) requires some knowledge of an arbitrary administrative process, at least two queries to find all the types, and displaces methodology from "nature." What I laid out works like everything else, doesn't require your techs to assess anything ('unknown' is always an option), doesn't need a specific publication, and there's a place for method-based 'nature' if it is known.

In any case I can drop this out of the migration if that's what's best.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUMYZAD7ECWH7FFRAQKTQPEAKDA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSAQOA#issuecomment-543426616, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM7RWRY6TPWEOPYYS5DQPEAKDANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

argue that 'type specimen' is a method

I'm not sure that's WRONG, but there's also a lot of noise from folks who might want to find the resulting types when someone magicks a species out of DNA or photos or other nontraditional methods. As far as I know similar data have never been published - who knows, maybe nobody will use it, they definitely won't if we structure our data such that the information isn't accessible.

2 queries

"Real" types in Arctos require publications; that's the way most users will find them, and it's what I use for DWC and reporting and etc. (Yes DWC has a field.)

setting that field to = type specimen for all type specimens

I suppose that's a curatorial call, but it's definitely not something I'd recommend.

how then can one specify a specimen is a type without creating a citation from a publication?

You can't, but it's easy to materialize publications. https://mczbase.mcz.harvard.edu/SpecimenUsage.cfm?action=search&publication_id=35944 exists to allow "normal" access to types without forcing you to actually track down the real publication. (I think they've got it a little too fine-grained, but the idea seems to work well enough - and the fake publications aren't holding thousands of records that REALLY don't look like types, which is nice!)

field dedicated to 'type status'

Yea, that's the norm. I don't think I've ever got much information out of one. I suppose we could spin up an attribute for that but then users would find it, use it to locate the few percentage of alleged types that aren't supported by publications, and leave thinking they've found what we have.

DerekSikes commented 4 years ago

OK, I'm convinced.

Best to do things only 1 way.

-Derek

On Thu, Oct 17, 2019 at 6:34 PM dustymc notifications@github.com wrote:

argue that 'type specimen' is a method

I'm not sure that's WRONG, but there's also a lot of noise from folks who might want to find the resulting types when someone magicks a species out of DNA or photos or other nontraditional methods. As far as I know similar data have never been published - who knows, maybe nobody will use it, they definitely won't if we structure our data such that the information isn't accessible.

2 queries

"Real" types in Arctos require publications; that's the way most users will find them, and it's what I use for DWC and reporting and etc. (Yes DWC has a field.)

setting that field to = type specimen for all type specimens

I suppose that's a curatorial call, but it's definitely not something I'd recommend.

how then can one specify a specimen is a type without creating a citation from a publication?

You can't, but it's easy to materialize publications. https://mczbase.mcz.harvard.edu/SpecimenUsage.cfm?action=search&publication_id=35944 exists to allow "normal" access to types without forcing you to actually track down the real publication. (I think they've got it a little too fine-grained, but the idea seems to work well enough - and the fake publications aren't holding thousands of records that REALLY don't look like types, which is nice!)

field dedicated to 'type status'

Yea, that's the norm. I don't think I've ever got much information out of one. I suppose we could spin up an attribute for that but then users would find it, use it to locate the few percentage of alleged types that aren't supported by publications, and leave thinking they've found what we have.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM565ZBCLXH475DNAS3QPEOEPA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSIM6Y#issuecomment-543458939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUM557RLEH7QKNI37VPDQPEOEPANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

Sweet!

Lacking better ideas materializing in the very near future, I will

Reasonable?

dustymc commented 4 years ago

I'd like to revise the above.

There are ~16K specimens with a NoID of "type specimen" which do not have a corresponding 'type' type_status, but >10K of them do have a publication (usually citing 'voucher') on the 'type specimen' identification. I think those are clearly just using 'type specimen' (as NoID) in a slightly different way, NOT attempting to use NoID as a placeholder for type specimens. Unless someone has a compelling reason to do otherwise, I will ignore those and only create citations (using the new fake publication) for the ~5K citations which use NoID "type specimen" without a corresponding publication.

ccicero commented 4 years ago

Out of curiosity, are there records that have NoID of 'type specimen' and have a type_status other than 'voucher' - i.e., ones that are actual types where it is a placeholder for type specimens?

Can you send me a csv of the 130 MVZ:Bird records that have NoID of 'type specimen' so I can look at those? Thanks.

dustymc commented 4 years ago

are there records that have NoID of 'type specimen' and have a type_status other than 'voucher'


col type_status format a20;
select  citation.type_status, count(*)  from 
 flat,identification,citation
 where
 flat.collection_object_id=identification.collection_object_id and
 identification.identification_id=citation.identification_id and
identification.nature_of_id='type specimen' and
citation.type_status!='voucher'
group by citation.type_status
;

TYPE_STATUS        COUNT(*)
-------------------- ----------
holotype            929
isosyntype            2
basis of illustratio        108
n

paratype           2725
erroneous citation       53
isotype              50
isolectotype              4
lectotype            45
symbiotype           78
neotype               3
referral            197
isoneotype            1
paratopotype           1037
syntype              64
host voucher            423
paralectotype            30

placeholder

Those are ACTUAL citations; I'm using the word "placeholder" for things that don't have citations (yet - they'll get them, but against a fake publication) but have some indication that they might be types. (So now the data essentially say "this is a type, here's the publication" and "this is a type, we just say so." This change will provide the possibility to turn the latter into "this is a type, it's based on features, it's a really great candidate for a molecular study!")

csv

It's attached to https://github.com/ArctosDB/arctos/issues/2170#issuecomment-543360029

dustymc commented 4 years ago

All catalog records with a 'type specimen' NoID are now attached to a publication; those that weren't already attached to something are on http://arctos.database.museum/publication/10008933

ccicero commented 4 years ago

Thanks!

DerekSikes commented 4 years ago

We need the new 'confidence' field in the data entry screen.

-Derek

On Fri, Oct 18, 2019 at 9:47 AM Carla Cicero notifications@github.com wrote:

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ACFNUM6B5QBGIEQZZRKETP3QPHZBVA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBVJ3OI#issuecomment-543858105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFNUMZGPAMKXFIT6KNH3DDQPHZBVANCNFSM4IEIGLOA .

--

+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960

dssikes@alaska.edu

phone: 907-474-6278 FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php

dustymc commented 4 years ago

data entry https://github.com/ArctosDB/arctos/issues/2170#issuecomment-537600903

atrox10 commented 4 years ago

Yes I’d like to see the mvz herps like that too

On Fri, Oct 18, 2019 at 9:19 AM Carla Cicero notifications@github.com wrote:

Out of curiosity, are there records that have NoID of 'type specimen' and have a type_status other than 'voucher' - i.e., ones that are actual types where it is a placeholder for type specimens?

Can you send me a csv of the 130 MVZ:Bird records that have NoID of 'type specimen' so I can look at those? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2170?email_source=notifications&email_token=ABCJF4NHXJ2EPFPMUQCXS33QPHOZHA5CNFSM4IEIGLOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBVACVA#issuecomment-543818068, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCJF4NZIGU6YUDJ6W2G3PLQPHOZHANCNFSM4IEIGLOA .

-- Sent from Gmail Mobile

amgunderson commented 4 years ago

I don't know what went on here but you have turned 1 simple and understandable field into a mess. I would like to load records identified by students, I have entered 100s of records with the nature of ID "student" but now I assume it should be "course features" with a confidence level that is pulled out of my a**? Where is the confidence level field in the bulk loader menu? I need to change "student" to "course features" but there is no way to add a confidence level from the bulkload SQL page and it doesn't appear at all in the AJAX page.

dustymc commented 4 years ago

Updates including bulkloader are done and the code table is cleaned up. It will take a couple days for everything to find its way to all forms. Closing this monstrosity; I opened a couple new issues for everything, I hope, that still needs handled.

confidence level that is pulled out of my a**?

I'd say that's NULL (='we have nothing informative to say, so we'll say nothing'), and at least earlier in this thread I was under the strong impression that that would be the case most of the time. Please prioritize https://github.com/ArctosDB/arctos/issues/2323 if there is an immediate need to assert confidence in data entry/bulkloaders.

campmlc commented 4 years ago

@dustymc
Please edit the Edit Citation/ Create Citation And Identification tool so that the default Nature of ID is NULL. Currently it is set to audio-visual. All dropdowns should always have a null default.