Clean up sex attribute table

Jegelewicz commented 3 years ago

In addition to #1237 we need to deal with the following terms in the sex code table.

SEX_CDE	Documentation
female ?	The examiner believes the specimen to be a female, but is uncertain.
male ?	The examiner believes the specimen to be a male, but is uncertain.
sexes mixed	Lot contains individuals of both sexes.

Suggestions?

dustymc commented 3 years ago

Suggestions?

female ?-->unknown + remark="possibly female" male ?-->unknown + remark="possibly male" sexes mixed-->unknown + remark or something, I suppose. ("Both" in reference to 9 values can't make too much sense, can it?!)

Jegelewicz commented 3 years ago

sexes mixed-->unknown + remark or something, I suppose. ("Both" in reference to 9 values can't make too much sense, can it?!)

How about "sexes mixed" in the remark - this would make it possible to find and then add two, five, or however many "sexes" are represented if one so desired...

ewommack commented 3 years ago

female ?-->unknown + remark="possibly female" male ?-->unknown + remark="possibly male"

That doesn't really represent an unknown though. It represents a sex determination that isn't really positive on. We use that when we have things like adult male plumage, but we cannot find the gonads. That's really different then where I would mark it as unknown, where I have no idea and no hints.

Would we loose data by marking them as unknown and being more conservative?

Maybe it would be better to mark them as: female ? ---> female + remark = "sex determination has question" male ? ---> male + remark = "sex determination has question"

dustymc commented 3 years ago

adult male plumage

We are severely under-utilizing method.

we cannot find the gonads

Add a second determination - sex=unknown, method=cannot find the gonads.

If we really want to be "research grade," we have to find a way to shift our entire mindset from "it's a boy" to "AGENT on DATE using METHOD thinks it's a boy." (And then somehow get researchers to follow....)

ewommack commented 3 years ago

Maybe we need to have a rethink on the definition of how we determine sex?

For museum specimens this has always been two things for me:

a historic tag or notes records the sex
we find the gonads while prepping an animal

But what about observation data or data without a vouchered specimen? When banding a bird I record male when the measurements, plumage, or behavior falls under a key for male. But I don't open the bird up to peak at the gonads.

How do we make the two data sets align and make sure you are searching for sex that is recorded to the best of our ability under the current research parameters?

dustymc commented 3 years ago

historic tag or notes

Method! If you happen to know and are willing to record it, who and when wrote the tag would be cool and useful; if you can't/won't then "tag" still seems infinitely better than nothing to me.

observation data

Method! Some way of sorting out "according to this blurry camera trap picture, ...." and "Some known ornithologist dug around in there on DATE (at which time they had 20 years of relevant experience) and ...." seems pretty useful to me.

make the two data sets align

Not our problem (just because I don't think we can do anything useful beyond recording what we know). For most questions, "someone thinks it's a male" is probably a sufficient answer anyway. Someone REALLY looking probably expects to have to dig a bit; we can at least give them everything we know in one place, not buried in "specimen remarks" with 38 other kinds of data. (And we can add their interpretation back as another determination.)

If they knew it was a possibility, maybe they'd even help make media for those tags in some way.

ewommack commented 3 years ago

Not our problem (just because I don't think we can do anything useful beyond recording what we know). For most questions, "someone thinks it's a male" is probably a sufficient answer anyway. Someone REALLY looking probably expects to have to dig a bit; we can at least give them everything we know in one place, not buried in "specimen remarks" with 38 other kinds of data. (And we can add their interpretation back as another determination.)

This is why I think we can't say: female ? ---> unknown male ? ----> unknown Switching them to unknown is losing recorded data. They are not unknown. They are probably one sex or the other.

dustymc commented 3 years ago

Gotcha. I was just suggesting a "safe" (=doesn't make any unfounded assertions) migration path. I'm totally fine with some other approach, either for everything or by collection or WHATEVER. If you think "female ?" should be "female" (plus remarks or something) then I do too; I'll get behind about anything that moves us towards cleaner data!

ewommack commented 3 years ago

@ccicero may have an opinion as well. She led the discussion on cleaning up the GitHub bird data at the workshop.

Jegelewicz commented 3 years ago

Would we loose data by marking them as unknown and being more conservative?

Maybe it would be better to mark them as: female ? ---> female + remark = "sex determination has question" male ? ---> male + remark = "sex determination has question"

That is probably a better path, but instead of "sex determination has question" I suggest "determination has low confidence".

Do we need "confidence" for attributes like we have for identifications?

dustymc commented 3 years ago

Do we need "confidence" for attributes like we have for identifications?

I don't think so. You can assess yourself via remarks, everyone else can assess you via method+agent/date. MAYBE there's some small bit of usefulness in there, but it would be a huge change in code and work required - I don't think that balances out.

jldunnum commented 3 years ago

I think it might be best to be conservative here and go with "unknown" and then put the other legacy data in another field. If the user is someone doing searches from GBIF or VertNet in order to just pull all of one sex for some reason they won't see the low confidence and could get specimens which aren't the correct sex. If the user is doing work at the individual specimen level they will be going into the record itself where all the information on confidence or ambiguity of the determination is there. They can then make a judgment on whether or not to use the data.

From: Teresa Mayfield-Meyer @.> Sent: Wednesday, March 17, 2021 9:57 AM To: ArctosDB/arctos @.> Cc: Subscribed @.***> Subject: Re: [ArctosDB/arctos] Clean up sex attribute table (#3516)

[EXTERNAL]

Would we loose data by marking them as unknown and being more conservative?

Maybe it would be better to mark them as: female ? ---> female + remark = "sex determination has question" male ? ---> male + remark = "sex determination has question"

That is probably a better path, but instead of "sex determination has question" I suggest "determination has low confidence".

Do we need "confidence" for attributes like we have for identifications?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3516#issuecomment-801199179, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA33QOQD7CLVK655RIDTEDGM5ANCNFSM4ZBEZC6A.

Jegelewicz commented 3 years ago

HMMM _ maybe this should be a collection by collection decision? Although I agree with @jldunnum that @ewommack option could mislead people at the aggregators....

dustymc commented 3 years ago

collection by collection

I have no problem with that - it's not ideal for users, but neither is what we're starting with.

I suspect these data vary from "IDK, maybe...." to "we're not 100% positive...." across time, collections, people, taxa, etc., etc., etc. - I doubt there is one true answer, whatever we do at least stops more of that.

I think getting to a place from where we are producing better data should outweigh about anything else.

Remarks and method are 4000 character fields (and could be bigger if needed) - we can be VERY verbose if that somehow facilitates this.

ewommack commented 3 years ago

Although I agree with @jldunnum that @ewommack option could mislead people at the aggregators....

I feel like I need to hear from someone who might work with the dataset. Is it better to have the data in there that has a medium level of confidence, or better to just throw it out? It is going to be a small part of the data set.

I think I also keep getting tied up with the different levels of confidence I apply between live trapping animals and museum specimens. The choice eventually is going to be different no matter what by the collection. A banding station's value for male would equal the same as our bird collections male ?, just because of the difference in how they determine the sex.

Jegelewicz commented 3 years ago

A banding station's value for male would equal the same as our bird collections male ?, just because of the difference in how they determine the sex.

And that would be covered if an appropriate method was applied to the attribute.

campmlc commented 3 years ago

I agree with Jon that method and remarks would be difficult to parse for aggregators and also for Arctos, because we can't currently download attribute remarks in a useable format. We are losing information if we go this route. Why not keep " female ?" , but start advocating the use of method, date, and determiner more rigorously.

On Thu, Mar 18, 2021, 12:59 PM Teresa Mayfield-Meyer < @.***> wrote:

[EXTERNAL]*

A banding station's value for male would equal the same as our bird collections male ?, just because of the difference in how they determine the sex.

And that would be covered if an appropriate method was applied to the attribute.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3516#issuecomment-802111357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBGVIWVYWRYX2SX6PUDTEIWQXANCNFSM4ZBEZC6A .

Jegelewicz commented 3 years ago

Why not keep " female ?" , but start advocating the use of method, date, and determiner more rigorously.

Because as @ccicero pointed out female ? is not a sex attribute, it is a question.

Also, as long as that term is in the code table, people will use it - documentation be damned!

campmlc commented 3 years ago

"possible female" ? That contains more info than unknown. It flags as requiring further scrutiny. "Unknown" does not.

On Fri, Mar 19, 2021, 10:35 AM Teresa Mayfield-Meyer < @.***> wrote:

[EXTERNAL]*

Why not keep " female ?" , but start advocating the use of method, date, and determiner more rigorously.

Because as @ccicero https://github.com/ccicero pointed out female ? is not a sex attribute, it is a question.

Also, as long as that term is in the code table, people will use it - documentation be damned!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3516#issuecomment-802878429, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBAMV2LF2ZUW3ES3XLLTENOKZANCNFSM4ZBEZC6A .

ccicero commented 3 years ago

Sorry, just jumping in here. I agree that we don't want to lose information, and 'female ?' has more information to me than just 'unknown' - yet from our workshop, sex concepts should be restricted to what is a sex: female, male, atypical, unknown.

There are three options for 'unknown' in Arctos - unknown, recorded as unknown, and not recorded. These have different meanings, at least how we use them: recorded as unknown --> someone tried but couldn't sex the animal not recorded ---> have data but there's nothing about sex unknown ---> no data (notes etc.) to indicate anything about sex

These should be combined into a single unknown, but with at least "recorded as unknown" or "not recorded" going into attribute remarks.

gynandromorph, hermaphrodite ---> ATYPICAL with that value in the remarks field,

female ? and male ? ---> in the workshop mappings, we put these as 'FEMALE' and 'MALE' which I think is better than just unknown as the latter is less informative, but we need to make the uncertainty known. I like the idea of a confidence score for sex, or we could put the uncertainty in remarks, although I know that wouldn't get mapped to aggregators nor downloaded. Still, if someone is looking for females, they still may want to look at those specimens and may have better methods (e.g., DNA, size) for confirming the sex. We just need a way of downloading the attribute remarks in a usuable way, also the determination method which I agree should be used more. What about having a controlled vocab for determination method, with anything else that's free text going in remarks - e.g., gonads, phenotype, genetic, behavior... Re: aggregators, the non-controlled sex values (other than the few concepts which get mapped to SEX) should go in DYNAMIC PROPERTIES.

mixed - in the workshop, we mapped those as ATYPICAL and the details again can go in remarks or for determination method, we have another controlled value of 'mixed'

Here is the file with our concept list and mappings for sex from the workshop.

dustymc commented 3 years ago

female ?' has more information to me than just 'unknown'

It's not usable information though - it's not "research grade." That's clearly demonstrated above, where a bird in one situation (banding, from examining characteristics) would get "female" and a bird in another (lab, gonads can't be found) would receive "female ?" The evidence is the same, the results are different.

This is a proposal to put those data into a usable place (method - not remarks!).

confidence score

Method is a USEFUL (if occasionally complicated) confidence score. "I'm sure this is a female" (because I just got this job banding and it doesn't look like a male!) and "I'm sure this is a female" (because I'm an experienced ornithologist and I'm looking right a the female-bits) is just a more complicated way of staying where we are, not producing research-grade data.

downloading the attribute remarks in a usuable way

They are. If they're not for you, tell me what you want (and perhaps provide the resources I'll need, depending on what that is) in another Issue. Even if true, I do not think we should allow things like this to distract us from creating research-grade data.

controlled vocab for determination method,

The goal should not be to do a researcher's work for them, but to provide them data from which they can confidently make their own categorizations as needed, using whatever tools they wish. A controlled vocabulary cannot do that in sufficient detail; those data can be useful only to the levels of sophistication we've baked in, which would be very low.

DYNAMIC PROPERTIES

Let's keep that a separate discussion. DWC should not drive what we do, and I very easily change how we map things to DWC.

ATYPICAL

I think that should also be a separate conversation - I'm not sure what's atypical for birds is also atypical for other collections in Arctos, and I don't think that conversation should distract us from where most of the subpar data production is happening.

jldunnum commented 3 years ago

I guess I am still very leary about just assigning confidence values (for identifications too) in that they are subjective and there are a lot of very confident idiots in this world. 😉 Maybe if we utilize methods those could auto generate a confidence value? For example, gonad examination would generate a "high confidence" value. Might need to have varied methods for different collection types though.

Bottomline is that research grade data should mean that the data are unambiguous. We need clear parameters so that if a researcher only wants data on females, they can filter and be 100% sure they are getting only females. But using a slightly less rigorous filter can also get those specimens which have a better than 50% chance of being females.

Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131

From: dustymc @.> Sent: Friday, March 19, 2021 9:44 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] Clean up sex attribute table (#3516)

[EXTERNAL]

female ?' has more information to me than just 'unknown'

It's not usable information though - it's not "research grade." That's clearly demonstrated above, where a bird in one situation (banding, from examining characteristics) would get "female" and a bird in another (lab, gonads can't be found) would receive "female ?" The evidence is the same, the results are different.

This is a proposal to put those data into a usable place (method - not remarks!).

confidence score

Method is a USEFUL (if occasionally complicated) confidence score. "I'm sure this is a female" (because I just got this job banding and it doesn't look like a male!) and "I'm sure this is a female" (because I'm an experienced ornithologist and I'm looking right a the female-bits) is just a more complicated way of staying where we are, not producing research-grade data.

downloading the attribute remarks in a usuable way

They are. If they're not for you, tell me what you want (and perhaps provide the resources I'll need, depending on what that is) in another Issue. Even if true, I do not think we should allow things like this to distract us from creating research-grade data.

controlled vocab for determination method,

The goal should not be to do a researcher's work for them, but to provide them data from which they can confidently make their own categorizations as needed, using whatever tools they wish. A controlled vocabulary cannot do that in sufficient detail; those data can be useful only to the levels of sophistication we've baked in, which would be very low.

DYNAMIC PROPERTIES

Let's keep that a separate discussion. DWC should not drive what we do, and I very easily change how we map things to DWC.

ATYPICAL

I think that should also be a separate conversation - I'm not sure what's atypical for birds is also atypical for other collections in Arctos, and I don't think that conversation should distract us from where most of the subpar data production is happening.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3516#issuecomment-802926571, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA3BWG3JSOZBK4ZKPZTTENWOPANCNFSM4ZBEZC6A.

Jegelewicz commented 3 years ago

if a researcher only wants data on females, they can filter and be 100% sure they are getting only females. But using a slightly less rigorous filter can also get those specimens which have a better than 50% chance of being females.

But that is just another kind of confidence meter? I think I am in agreement with @dustymc on this one. If we add appropriate information in method, the person who is using the data can determine for themselves what confidence to apply to each determination. I don't think there is any way, without examining specimens yourself, to be 100% sure that what someone said is a female is actually a female.

dustymc commented 3 years ago

gonad examination would generate a "high confidence" value

I've been through way too many shrews with Dokuchaev to believe that....

research grade data should mean that the data are unambiguous

Fully agreed, but that doesn't have to (and can't) lead to absolute confidence. What we can do is remove the mystery in how we made the determination.

Sex=female, remarks=testes finds 40 records at the moment, which is pretty good but evidence that mistakes and misinterpretations are inevitable. (It's also evidence that we record data in inappropriate fields - "testes" should be in attribute remarks or methods, not in our official junkyard.) "Here's how we got there" in a predictable place is as close to research grade as anyone can realistically expect of us. (That doesn't even require us to change anything about our values, although I agree that we're just adding confusion by having lots of ways of hiding methodology.)

dustymc commented 3 years ago

add appropriate information in method

... for everything in https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type, not just sex!

jldunnum commented 3 years ago

OK good points Theresa and Dusty. Do you think this will work for the aggregators as well or will this be another place where our pretty robust data get distilled down and we lose that extra info on that end.

From: dustymc @.> Sent: Friday, March 19, 2021 11:11 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] Clean up sex attribute table (#3516)

[EXTERNAL]

gonad examination would generate a "high confidence" value

I've been through way too many shrews with Dokuchaev to believe that....

research grade data should mean that the data are unambiguous

Fully agreed, but that doesn't have to (and can't) lead to absolute confidence. What we can do is remove the mystery in how we made the determination.

Sex=female, remarks=testes finds 40 records at the moment, which is pretty good but evidence that mistakes and misinterpretations are inevitable. (It's also evidence that we record data in inappropriate fields - "testes" should be in attribute remarks or methods, not in our official junkyard.) "Here's how we got there" in a predictable place is as close to research grade as anyone can realistically expect of us. (That doesn't even require us to change anything about our values, although I agree that we're just adding confusion by having lots of ways of hiding methodology.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3516#issuecomment-802984403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA6ZD7QK4FRTLJ2U7DLTEOAVTANCNFSM4ZBEZC6A.

dustymc commented 3 years ago

aggregators

It could - I have dynamicProperties mapped to key-value string data, DWC now says they like JSON, we have JSON that contains all of the information, it's trivial to change the mapping.

Sorting https://github.com/ArctosDB/arctos/issues/2131#issuecomment-800712123 (our keys are pretty cryptic, they don't have to be) out before changing stuff is probably worthwhile.

dustymc commented 3 years ago

Suggest taking a babystep: from https://github.com/ArctosDB/arctos/issues/3516#issuecomment-802913954

not recorded--->unknown, method+="There is data in the form of a label or field notes, and there is no mention of sex."
recorded as unknown-->unknown, method+="There are data in the form of a label or field notes, and these indicate that the examiner was unable to determine the sex."

which probably means we also need a more comprehensive definition for https://arctos.database.museum/info/ctDocumentation.cfm?table=ctsex_cde#unknown

How about two babysteps?

sexes mixed: Lot contains individuals of both sexes.

becomes two attributes:

male
female

based on "both" in the definition.

Jegelewicz commented 3 years ago

sexes mixed: Lot contains individuals of both sexes.
becomes two attributes:
male
female
based on "both" in the definition.

YES to this. But probably should add remark "sexes mixed" for clarity

Jegelewicz commented 3 years ago

DLM edit: add "and/or method"

not recorded--->unknown, method+="There is data in the form of a label or field notes, and there is no mention of sex."
recorded as unknown-->unknown, method+="There are data in the form of a label or field notes, and these indicate that the examiner was unable to determine the sex."
which probably means we also need a more comprehensive definition for https://arctos.database.museum/info/ctDocumentation.cfm?table=ctsex_cde#unknown

How about

unknown = "Sex is either not determinable or not recorded/there was no attempt to determine. Remarks and/or method should be used to elaborate."

dustymc commented 3 years ago

add remark "sexes mixed" for clarity


create table temp_sexmixed as select * from attributes where attribute_type='sex' and attribute_value='sexes mixed';

insert into attributes (
  collection_object_id,
  determined_by_agent_id,
  attribute_type,
  attribute_value,
  attribute_remark,
  determination_method,
  determined_date
) (
  select
    collection_object_id,
    determined_by_agent_id,
    'sex',
    'male',
    concat_ws('; ',attribute_remark,'Formerly "sexes mixed"'),
    determination_method,
    determined_date
  from
    temp_sexmixed
);

insert into attributes (
  collection_object_id,
  determined_by_agent_id,
  attribute_type,
  attribute_value,
  attribute_remark,
  determination_method,
  determined_date
) (
  select
    collection_object_id,
    determined_by_agent_id,
    'sex',
    'female',
    concat_ws('; ',attribute_remark,'Formerly "sexes mixed"'),
    determination_method,
    determined_date
  from
    temp_sexmixed
);

delete from attributes where attribute_type='sex' and attribute_value='sexes mixed';

delete from ctsex_cde where sex_cde='sexes mixed';

Jegelewicz commented 3 years ago

Code table definition for sex = "unknown" has been updated.

dustymc commented 3 years ago


create table temp_attr_sx_unk_run as select * from attributes where attribute_type='sex' and attribute_value in ('not recorded','recorded as unknown');

update 
  attributes 
set 
  attribute_value='unknown',
  determination_method=concat_ws('; ',determination_method,'Formerly "not recorded": There is data in the form of a label or field notes, and there is no mention of sex.') 
where 
  attribute_type='sex' and 
  attribute_value='not recorded'
;

update 
  attributes 
set 
  attribute_value='unknown',
  determination_method=concat_ws('; ',determination_method,'Formerly "recorded as unknown": There are data in the form of a label or field notes, and these indicate that the examiner was unable to determine the sex.') 
where 
  attribute_type='sex' and 
  attribute_value='recorded as unknown'
;

delete from ctsex_cde where sex_cde='not recorded';
delete from ctsex_cde where sex_cde='recorded as unknown';

dustymc commented 3 years ago

Easy stuff is gone, female ? and male ? remain.

Is there any consensus whether those should be "unknown" or "male"/"female"? Either case will involve verbose remarks. Should we flip a coin?

Jegelewicz commented 3 years ago

Easy stuff is gone, female ? and male ? remain.

Is there any consensus whether those should be "unknown" or "male"/"female"? Either case will involve verbose remarks. Should we flip a coin?

Now I will inject complexity. Do we actually need "attribute confidence" just as we have with identification?

If no one wants that, I suggest that male ? be changed to male with the remark "sex determination is uncertain, but assumed to be male" and ditto for female ? with appropriate wording. @ccicero

dustymc commented 3 years ago

confidence

I still don't think that can be useful, nor do anything that existing data doesn't. "I think I'm pretty good at this!" (confidence) isn't Research Grade. "I can't find {x} but it has {y} so its probably female" (method) is.

https://github.com/ArctosDB/arctos/issues/3516#issuecomment-802926571

dustymc commented 2 years ago

TODO: Get CSV for the remaining ?-having values, suggest path, set "or else" date.

dustymc commented 2 years ago

EDIT: There are objections, not proceeding, adding to AWG Agenda.

~~I will proceed with the below if there are no objections by 2022-03-04.~~

Proposed migration path (revised based on above remarks):

 attribute_value | new_attribute_value |             remark_appendix              
-----------------+---------------------+------------------------------------------
 male ?          | male                | Attribute originally given as "male ?"
 female ?        | female              | Attribute originally given as "female ?"

Data:

temp_maybe_sex.csv.zip

Summary:


 guid_prefix |  c   
-------------+------
 ALMNH:Paleo |    1
 APSU:Herp   |    2
 ASNHC:Bird  |    2
 BYU:Herp    |   34
 BYUObs:Herp |    2
 CHAS:Bird   |   21
 CHAS:Mamm   |    2
 DMNS:Bird   |  104
 DMNS:Mamm   |  126
 MLZ:Bird    |   60
 MLZ:Fish    |    2
 MLZ:Mamm    |    4
 MSB:Bird    |  304
 MSB:Herp    |   36
 MSB:Mamm    |  356
 MVZ:Bird    | 2656
 MVZ:Herp    |   26
 MVZ:Mamm    |   46
 MVZObs:Bird |    9
 MVZObs:Mamm |    1
 NMMNH:Bird  |    1
 NMU:Bird    |    2
 NMU:Mamm    |    5
 OWU:Bird    |    1
 OWU:Rept    |    5
 UAM:Bird    |  600
 UAM:Mamm    |  360
 UAMObs:Bird |    2
 UCM:Bird    |   84
 UCM:Mamm    |   24
 UCSC:Bird   |    2
 UCSC:Mamm   |    3
 UMNH:Bird   |    1
 UMNH:Mamm   |   15
 UMZM:Bird   |   16
 UMZM:Mamm   |    4
 UNR:Mamm    |    1
 UTEP:Bird   |   92
 UTEP:Herp   |    2
 UTEP:HerpOS |   22
 UTEP:Mamm   |    2
 UWBM:Mamm   |    1
 UWYMV:Bird  |   11

Users:

@ebraker @byuherpetology @mkoo @ewommack @campmlc @ccicero @kderieg322079 @catherpes,@catherpes @amgunderson @atrox10 @mvzhuang @cjconroy @jtgiermakowski @wellerjes @lin-fred @keg34 @gradyjt @droberts49 @zmsch @jrpletch @aklompma @acdoll @SerinaBrady @jldunnum @jrdemboski

catherpes commented 2 years ago

i don't see what's wrong having a sex ? attribute this is an extra step in data entry - having to add a remark to the attribute rather than just adding a question mark - and data entry is already an onerous enough task I believe that it also means that each record has to be opened to see the remarks pertaining to sex, rather in the results output after the search. Someone could be mislead if they didn't open the record. Do what you will, but my strong preference is to leave this as is.

byuherpetology commented 2 years ago

I agree, I don’t think there is a problem with reflecting uncertainty in this field.

On Feb 18, 2022, at 2:41 PM, catherpes @.***> wrote:

i don't see what's wrong having a sex ? attribute this is an extra step in data entry - having to add a remark to the attribute rather than just adding a question mark - and data entry is already an onerous enough task I believe that it also means that each record has to be opened to see the remarks pertaining to sex, rather in the results output after the search. Someone could be mislead if they didn't open the record. Do what you will, but my strong preference is to leave this as is.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3516#issuecomment-1045220514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALSRENIW4J7V4OQBDC6EO6TU324JJANCNFSM4ZBEZC6A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.

lin-fred commented 2 years ago

@catherpes @byuherpetology I think that using the sex ? is vague. If there is uncertainty, then why even define it as being possibly male or female? Also take a look at this entire thread, as your concerns may have also been discussed above.

If there is concern about seeing the information on the search page, I think going with unknown for sex? would be best, with a remark about why it's thought to be male/female. This way, on the search page there would be no vagueness, but no data would be lost either.

catherpes commented 2 years ago

It flags uncertainty. if I see 'unknown,' I'm not likely to go looking for any comments regarding the sex, or look at reproductive information. if I see female? I'm likely to look and see if the preparator added any information as to why it was deemed likely to be the sex they said it was. see https://arctos.database.museum/guid/MSB:Bird:40950. in this case the Female ? would cause me to investigate - if that was important to me for whatever I was using the specimen for - and I see in the repro field that what I thought was likely an ovary was observed. So it's not entirely unknown. Not known well enough to my satisfaction, to confidently put OV instead of [OV], but it's something and that could be useful.

lin-fred commented 2 years ago

It flags uncertainty. if I see 'unknown,' I'm not likely to go looking for any comments regarding the sex, or look at reproductive information. if I see female? I'm likely to look and see if the preparator added any information as to why it was deemed likely to be the sex they said it was. see https://arctos.database.museum/guid/MSB:Bird:40950. in this case the Female ? would cause me to investigate - if that was important to me for whatever I was using the specimen for - and I see in the repro field that what I thought was likely an ovary was observed. So it's not entirely unknown. Not known well enough to my satisfaction, to confidently put OV instead of [OV], but it's something and that could be useful.

Thanks for the example! So how does someone decide to put [OV] instead of OV? Is it because it was possibly not an ovary? A different organ? Or a teste?

catherpes commented 2 years ago

OV in brackets or with a question mark are fairly interchangeable. I've always used brackets.

It looked about like an ovary but could have been an adrenal gland. things were soupy. Thought I saw a testis. etc.

lin-fred commented 2 years ago

OV in brackets or with a question mark are fairly interchangeable. I've always used brackets. It looked about like an ovary but could have been an adrenal gland. things were soupy. Thought I saw a testis. etc.

So in this case you could put teste 2x2 with "male?" or even put adrenal gland 2 x2 and sex unknown, and it would mean the same thing?

What does "male?" mean? Doesn't it mean that it is possibly male but also possibly female?

And what does "female?" mean? Doesn't it mean that it is possibly female but also possibly male?

So if male?=male or female and if female?=female or male and unknown = female or male

How are they different?

And as far as investigating further when searching, if for example, someone is looking for females, they should look for females, female?, male? and unknown, because all of those have the possibility of being females, and so should all be looked at for the remarks! If looking for females, and only selecting females and females?, well then that person is not doing their due diligence and/or having the female/male? option biases towards people thinking they ARE female or male, which is wrong to assume.

Having male? and female? makes it more complicated, because you have to search more terms that all mean the same thing as unknown, and setting a bias without solid evidence of that bias.

P.S. soupy birds are the bane of existence

byuherpetology commented 2 years ago

I understand the desire to have research grade data. In my mind the ? signifies that this is not a research grade observation. If you only want research grade data in this field then I think you change male ? to unknown with "attribute originally given as male ?" in the remarks.

For me, I use the "male ?" when a researcher would like to borrow male specimens. At least I know that someone thought this could be a male - it has a higher chance of being male that one marked "unknown". It seems a shame to lose that information entirely, but perhaps that it is better included somewhere else.

lin-fred commented 2 years ago

I understand the desire to have research grade data. In my mind the ? signifies that this is not a research grade observation. If you only want research grade data in this field then I think you change male ? to unknown with "attribute originally given as male ?" in the remarks.

For me, I use the "male ?" when a researcher would like to borrow male specimens. At least I know that someone thought this could be a male - it has a higher chance of being male that one marked "unknown". It seems a shame to lose that information entirely, but perhaps that it is better included somewhere else.

So in this case, why not have the attribute be male, with a remark about it being assumed male? At this point you are using it as a male, and the researcher is using it as a male, so it's male.

lin-fred commented 2 years ago

It seems a shame to lose that information entirely, but perhaps that it is better included somewhere else.

Also can you help me out in understanding how that information would be lost entirely if it's in the sex attribute remarks? It's still linked with the sex attribute data.

byuherpetology commented 2 years ago

The researcher isn’t assuming it is male, any researcher doing anything with sex will verify all determinations. Male ? just gives me a better shot of getting enough male specimens to the researcher in the first shot while not being represented as a definitive male to someone who may only be doing a search and never examine the specimens. It clarifies that this is a guess and not research grade data.

Sent from my iPhone

On Feb 18, 2022, at 4:10 PM, Lindsey NMMNHS @.***> wrote:

I understand the desire to have research grade data. In my mind the ? signifies that this is not a research grade observation. If you only want research grade data in this field then I think you change male ? to unknown with "attribute originally given as male ?" in the remarks.

For me, I use the "male ?" when a researcher would like to borrow male specimens. At least I know that someone thought this could be a male - it has a higher chance of being male that one marked "unknown". It seems a shame to lose that information entirely, but perhaps that it is better included somewhere else.

So in this case, why not have the attribute be male, with a remark about it being assumed male? At this point you are using it as a male, and the researcher is using it as a male, so it's male.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

catherpes commented 2 years ago

But why have to go through layers of clicking to get to that uncertainty? Somebody thought it was a male but wasn't sure. put 'male?' I don't understand why things have to be one of three states. Add a couple more. Or keep them, rather.

This is frustrating. I've exceeded my quota of comments for the quarter.

ArctosDB / arctos

Clean up sex attribute table #3516