Code Table Request - examined/detected/not detected attributes

Jegelewicz commented 1 year ago

Goal Allow users to find catalog records that include:

things that have been examined/tested for something
things that have something detected
things that DO NOT have something detected

Provide a list of something to choose from.

Context Reduce the proliferation of "examined for" and "detected" possibilities and create a structure for this type of data that allows for capture at a coarse level. More detailed information likely belongs elsewhere and should be linked to Arctos OR requires some sort of "module" outside of the current data structure to capture it well.

Table Collection Object Attribute: Types

Proposed Value

examined for detected not detected

Proposed Definition

examined for - A systematic examination for the attribute value was made. detected - The attribute value was detected. not detected - The attribute value was not detected.

Collection type potentially all

Attribute data type categorical

Attribute value Requesting a new code table: Attribute: Examined For - to begin with the following terms:

Term	Description
ectoparasite	a parasite, such as a flea, that lives on the outside of its host.
endoparasite	a parasite, such as a tapeworm, that lives inside its host.
parasite	an organism that lives in or on an organism of another species (its host) and benefits by deriving nutrients at the other's expense.
Sin Nombre orthohantavirus	Sin Nombre orthohantavirus (SNV) (from Spanish, meaning "without a name") is the prototypical etiologic agent of hantavirus cardiopulmonary syndrome (HCPS).

Attribute units [ For number+units attributes, code table controlling units ]

Available for Public View Yes

Priority [ Please choose a priority-label to the right. ]

Note: There would be a migration path for the following attributes, which would then be removed from the code table

Term	Migration Path
SNV results	Probably unload current attribute and re-load new attributes with better information in date, determiner and method
ectoparasite examination	unload this attribute, if yes, add examined for = ectoparasite, If no, do we need something for no*?
ectoparasites detected	unload this attribute, if yes, add detected = ectoparasite, If no, add not detected = ectoparasite
endoparasite examination	unload this attribute, if yes, add examined for = endoparasite, If no, do we need something for no*?
endoparasites detected	unload this attribute, if yes, add detected = endoparasite, If no, add not detected = endoparasite
examined for parasites	unload this attribute, if yes, add examined for = parasite, If no, do we need something for no*?

See also proposal in https://github.com/ArctosDB/arctos/issues/5087

Jegelewicz commented 1 year ago

Known shortcomings

Perhaps we also need "not examined for"?
Users looking for "viruses" will need to figure out what all the "virus" possibilities are in the code table.
Date, determiner and method can be sued to link "examined for" with "detected" or "not detected", but that link may be somewhat tenuous.
A lot of things in the value code table will also be taxa.

What have I forgotten?

Jegelewicz commented 1 year ago

Perhaps if these values are quite coarse, something like https://github.com/ArctosDB/arctos/issues/5101 could be used to describe things in more detail? However, associating that with the above may not be easy.

kderieg322079 commented 1 year ago

Perhaps we also need "not examined for"?

Is this just so that ecto/endoparasite examination = no can be migrated properly? I think this makes sense so we aren't losing any data, but I don't see this term getting much use going forward since there are a lot of things that most specimens will never be examined for.

In general, everything you've outlined will work nicely for UMNH because we have very little parasite data so far and, to my knowledge, no pathogen screening data.

Jegelewicz commented 1 year ago

Is this just so that ecto/endoparasite examination = no can be migrated properly?

Maybe? I am not sure if the "No" answers matter there to anyone - but maybe they do...

campmlc commented 1 year ago

Calculation of parasite or pathogen prevalence is number of infected individuals / total number examined x 100, eg the percent infected of those examined. That's why we have the negative " not examined" to explicitly exclude those. This is especially useful at the expedition or project level, where you can record and easily find which specific individuals were examined in order to download the correct data for assessing parasite load and prevalence. That said, I'm trying to wrap my brain around whether we need a separate attribute for this, or whether we just leave the examined for blank if no data are present. Right now the existing attributes "examined for parasites" ( and ectos and endos) are yes/ no options, so a good question would be how many " no" values are there in current use.

Jegelewicz commented 1 year ago

That's why we have the negative " not examined" to explicitly exclude those.

I just don't see how anyone outside of MSB would know which group of whatever "hosts" should be included in any given "prevalence" assessment. I guess using collection places and dates would suffice, but what would someone make of a record that falls in their place/time constraint but has NONE of these attributes?

leave the examined for blank if no data are present.

Why add it at all - this seems like it would just create even more confusion?

how many " no" values are there in current use.

ectoparasite exam = no 10,800 records

endoparasite exam = no 15,865 records

dustymc commented 1 year ago

What relevant questions cannot be answered with two attributes?

looked and found
looked and did not find

Jegelewicz commented 1 year ago

Apparently

didn't look so didn't find
didn't look but found

dustymc commented 1 year ago

Alternate proposal which keeps method (and agent and date) coupled:

one attribute: examination

values

examined for and found A examined for and did not find B found without examination B

so - not what I said in the meeting, but rather THREE values for every "thing"

This accommodates

one DATE PERSON using METHOD1 looked for and didn't find A one DATE PERSON using METHOD2 looked for and didn't find A one DATE PERSON using METHOD3 looked for and did find A (... etc for every B and C and...)

(through 3 attributes).

The Good: there's no ambiguity, no gulfs to try to cross, it's all explicit assertions

The Bad: value code table is three times as big as one might expect

dustymc commented 1 year ago

@campmlc doesn't think ^^ is usable

Gabor proposal

2 attribtues

looked (expected to be used once per record when there's an examination)
found

These would share a 'value' code table, with 'endos' and 'ectos' (and such and subdivisions of)

Pros: easy, intuitive (I think??) Cons: slightly loose link between looked and found - need good procedures to keep date consistent and etc.

USAGE:

looked and found one thing - two attributes looked and didn't find - 'look' attribute only found without looking - 'found' attribute only

Jegelewicz commented 1 year ago

Use

examined for - A systematic examination for the attribute value was made on the determined date using the determination method. detected - The attribute value was detected on the determined date using the determination method.

Then, need a code table for values. @campmlc to start a GoogleSheet

And we need good documentation on how to use these and when to move on to separate cataloging of what was detected

dustymc commented 1 year ago

NEED: documentation

how to record data for prevalence - Best Practices
what makes a good method
what answers require a cataloged item
use cases: what to do when........

campmlc commented 1 year ago

I propose the wording for the following attributes "Examined for" "Detected" "Not Detected" Draft Code Table values: https://docs.google.com/spreadsheets/d/1bboO2LLBd1D26ykbomQvIFU9ocOH9L9V0lBqCSipaBc/edit?usp=sharing

Jegelewicz commented 1 year ago

@campmlc REALLY wants to include

not detected - The attribute value was not detected.

And the one thing we could do with that is a data quality check. If there is no "detected" or "not detected" attribute with the same determination date as "examined for", then something is missing. If we leave off "not detected", this would not be possible. For the group - what do you prefer, the two

examined for detected

or three term method?

examined for detected not detected

@campmlc @gracz-UNL @kderieg322079 @bryansmclean @adhornsby @ehalverson26 @jldunnum

dustymc commented 1 year ago

not detected - The attribute value was not detected.

Given one record with a negative assertion and another thousand with what was proposed by @gracz-UNL (my interpretation in https://github.com/ArctosDB/arctos/issues/5688#issuecomment-1458578242), I think users will inevitably and incorrectly assume that the one sets the pattern and the thousand aren't of interest.

I don't believe that "we" are in any way capable of maintaining a comprehensive list of the things we didn't find. I see no plausible way in which these data can be managed, and a great and inevitable risk of them being misunderstood. I believe the current data support this hypothesis.

Simply asserting what we know - what Gabor has proposed - seems like something that we can easily manage and that users can easily understand.

Draft Code Table values:

That (eventually) will need formatted to comply with the structure, which is a term and definition. My suggestion would be to structure the values like:

endoparasites
endoparasites: flukes
endoparasites: tapeworms

which should serve as a sort of built-in category and make things sort nicely, but that's certainly not the only possibility.

jtgiermakowski commented 1 year ago

Hi @Jegelewicz @dustymc and @campmlc et alia! We have two immediate needs for these types of attributes in Arctos. We have hundreds of specimens that were examined for malaria and in some cases, malaria was found. We also are getting back a loan for some pathogen analyses and only about 1/3 are positive for one of the two diseases. For both instances the three-term method that @Jegelewicz outlines above works great. Please, make sure that there's a field for agent determiner and date, as well as a remarks field for methods or links to media (such as a report we get from USFWS). this is great!

dustymc commented 1 year ago

three-term method

Can you elaborate on that? I still can't understand what question "not detected" can answer that "examined" and (the absence of) "detected" cannot.

jtgiermakowski commented 1 year ago

I propose the wording for the following attributes "Examined for" "Detected" "Not Detected" Draft Code Table values: https://docs.google.com/spreadsheets/d/1bboO2LLBd1D26ykbomQvIFU9ocOH9L9V0lBqCSipaBc/edit?usp=sharing

@campmlc and @dustymc I added a comment/description for chytridiomycosis but it would probably be best to split this up or have a field to specify it separate from method.

jtgiermakowski commented 1 year ago

three-term method

Can you elaborate on that? I still can't understand what question "not detected" can answer that "examined" and (the absence of) "detected" cannot.

@dustymc , if we don't have that term ("not detected") then we end up with the same problem we currently have for sex determination for specimens. "Undetermined" is not the same when you examined a specimen for sex and can't determine it (as in, the gonads are missing, an external examination is impossible because the business end is missing, etc.) vs. data is missing because it was never recorded, nobody ever bothered, but someone decided to use the field anyways. Disambiguation is key! I just need to be able to capture both positive and negative results from a USFWS report for a bunch of specimens. Maybe it can be solved with "examined for" as one variable, a table of possibilities "endoparasite, etc." and "yes/no", with a determiner, date, method, remarks, etc.

dustymc commented 1 year ago

Disambiguation is key!

On this we are agreed!

To expand on the sex analogy, we don't do there what's been suggested here. If this was that, I'd propose the following states

examined

"we looked (using whatever methodology is listed) and found no results" (so maybe the critter doesn't have any of the traits we test for, or maybe our methodology isn't great)

male

"it's a male, but we just stumbled on this"

examined
male

"we looked and found it to be a male."

So the first and last are good for prevalence and such (they have 'examined' and either result or NULL), and the middle is not (there was no systematic examination).

We DO NOT need the fourth state (third assertion), which would necessarily either be incomplete (and so confusing), or a listing of everything we know how to examine for:

not male
not female
not gynandromorph
not hermaphrodite

.... and then incomplete (and so confusing) when we inevitably add another value.

https://github.com/ArctosDB/arctos/issues/5759 is the UI adjustment necessary to ask questions of the two-attribute model.

campmlc commented 1 year ago

We need the not detected third attribute for at least two essential reasons. 1) To explicitly capture data on those specimens that gave a negative test result for a particular date,determiner, and method. It is possible this same specimen might give a positive result for a different method etc. We have this explicit level of data to record somewhere - we need the attribute to say that two thirds of the specimens examined for a particular virus etc explicitly had a negative test result according to some test and method. 2) We usually do not have test results at time of cataloging. We can record the "examined for" data at that time. But the "Detected" values, both negative and positive, may not be available for months and will need to be added via an attribute bulkload. We need to be able to distinguish negative values that just haven't gotten results back yet (eg "Detected" attribute not present) from those records that explicitly have negative test results (" Not Detected" attribute present). I've given up on asking for a Not Examined attribute, even though we record these data and that will now have to be shoved into remarks somewhere. But we cannot compromise on requiring a Not Detected attribute. @jldunnum

jtgiermakowski commented 1 year ago

@dustymc i understand the logic for the sex attribute you propose and maybe that's how it should be but at least we have a 'undetermined' field where we can put remarks on why that is. In this instance, as @campmlc points out, sometimes the results of examinations are not back for weeks or months. It all boils down to how the data can be effectively used and/or downloaded. I don't understand the database cost or the necessity of not storing or hiding a negative test result.

When you take a COVID test, it's either positive, negative or inconclusive. If I call the pharmacy a few hours later, the clerk doesn't say...."well, it just says you took a test and it's not marked in another attribute as positive so you probably don't have COVID, our database doesn't keep track of whether it's negative or inconclusive, that would be confusing to the programmers."

dustymc commented 1 year ago

It is possible this same specimen might give a positive result for a different method

Yes, I think there's a gap around that link, and maybe I suddenly understand it: Perhaps we're just underutilitzing method?

detected BLA (using method)
did not detect BLAH (using method)
detected BLAUGH (no method given so can't be used in prevalence)

I don't quite knwo how to structure that, but it doesn't involve two THINGS that need to be coupled where there's no available coupler.

at time of cataloging

I don't think that changes anything, however the rest works out??

Jegelewicz commented 1 year ago

From meeting today.

Go with three attributes as described in initial comment.

@jldunnum to provide data @dustymc to set up for review

I will help however needed - just let me know.

bryansmclean commented 1 year ago

@Jegelewicz can we plan to connect next Tues or whenever the test is finally implemented? Maybe we could all view the functionality together?

Jegelewicz commented 1 year ago

Invite sent for meeting on the 28th!

dustymc commented 1 year ago

@Jegelewicz can you open issues for the new attributes and the CT (with at least one value)? Even if we rush things and understand that this is a bit experimental, it'll be useful to have that supporting info.

dustymc commented 1 year ago

@Jegelewicz is there an Issue for the value code table? I know(ish) what the contents will be, but I don't know what to name the table+value-column and I think that needs sorted out before the three attribute-issues can be resolved.

Jegelewicz commented 1 year ago

From today's meeting:

@jldunnum and @bryansmclean plan to provide test data @dustymc will create the new code table @Jegelewicz will create the new attributes and associate them with the code table (after the boxes there are checked) @campmlc @jldunnum @bryansmclean to start issues for new code table terms needed for test data

The plan is to have some stuff to review in two weeks - a meeting invitation has been sent.

Jegelewicz commented 1 year ago

A reminder for @Jegelewicz

https://arctos.database.museum/guid/UTEP:ES:25-566

has a pathology of sorts in condition - digestion

could use pathology and put this in remark or request Pathology: digestion - the object has attributes that suggest it has passed through the digestive tract of another organism.

There are probably more of these - so do a search and fix them all if this happens.

dustymc commented 1 year ago

I'm moving some scattered discussion back here. I still don't understand the need for the third attribute, whatever form the two remaining take, and others - do, I think?? I am NOT trying to argue that I haven't confused myself, that does seem possible, but I'm going to be asked to query this and I need to understand it. (And anyone who's going to be shaping data to it does as well).

I got a consultation from our resident mathematician, she can't find obvious problems in my theory (as presented with all my biases, so.....).

She did note that the third factor would require TWICE as much data from the 'testers' and TWICE and much entry, all with scrupulous consistency.

Around https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1487839100 @campmlc says the third attribute is necessary to calculate prevalence.

With two attributes, that is the ratio of tested positives to total tested:

"detected: thingA" / ("not detected: thingA" + "detected: thingA")

That still needs a bit of "method filtering" to reject the incidentals, but it's a filter, not a join, and it's applied to a lot less data.

I still can't wrap my head around the three-factor approach, but I believe that would require the same query but use 'tested: thingA' rather than method to exclude incidentals, but then it requires a JOIN or match across multiple factors - agent, date, method was mentioned several times - to correlate the tested and results attributes. (And I suspect any rigorous analysis is going to want to filter on method anyway, which would - I think - accomplish this in a simpler system.)

Prevalence=detected: thingA WITH tested: thingA WHERE methods match / ( (detected: thingA WITH tested: thingA WHERE methods match) + (not detected: thingA WITH tested: thingA WHERE methods match))

and then somehow also make the multifactor methods match between

the pairs that give the sum, and
the sum and the tested positives

??????????????

HELP!!

Jegelewicz commented 1 year ago

I will not ever be using this data, although at some point I may end up entering it. The above makes sense to me though and reduces the number of attributes required.

The only argument for "examined for" that I have heard that makes sense is that a test has been initiated, but no results are available, so we can at least know the exam is in process. But is that reality? Or is reality more like what we have with the set of data from @jldunnum where the testing has occurred and the results are known and we are recording everything at once?

Using that data, for every catalog record we could record "detected" or "not detected", depending upon the results and then prevalence (for that study) could be calculated with the formula @dustymc gave.

"detected: thingA" / ("not detected: thingA" + "detected: thingA")

My objection to all of this is that without extreme consistency in date, determiner, and method, only those with certain knowledge about this particular study could ever re-create that prevalence number (a project might make that better, but not everyone understands Arctos projects and I doubt anyone downloading data would figure it out). I truly believe that this kind of data needs to be stored elsewhere in such a manner that allows it to stand alone as research results and that what we should be doing in Arctos is providing a link from the associated records to the external data. This is the entire concept behind the digital extended specimen. That being said, if no such external source exists (I don't believe this to be true, but let's say it is), then the two attributes do seem to be sufficient.

I've said my piece - the community can decide.

dustymc commented 1 year ago

From @jldunnum in https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1489116320:

I think having the "Examined for" will allow for more comprehensive returns of all records we are interested in on a given search and will require a simpler query. I can search "examined for" Virus: Orthohantavirus from a place or time or taxon and get all records that have been screened. In the "Detected" and "Not detected" model I would need to search for virus: Orthohantavirus in both those fields and any variation in the methods, dates, agents would cause me to miss records. "Examined for" can also return a broader array of detected things because we wouldn't have to specify the exact detected value. I think its better to get potentially more than I want and have to filter than to potentially miss anything. "Examined for" allows for capture of examinations or screening events that we don't have results for yet.

Again I think we all understand this is not the ideal model and dedicated modules would be better but until then we have been using this model for quite some time now and I think its been working for those who use it.

dustymc commented 1 year ago

I can search "examined for" Virus: Orthohantavirus from a place or time or taxon and get all records that have been screened.

Yep.

In the "Detected" and "Not detected" model I would need to search for virus: Orthohantavirus in both those fields

yep

and any variation in the methods, dates, agents would cause me to miss records.

Maybe? I'd just pull both (and maybe we need some sort of "this OR that" attribute search, which might also mitigate the above - but IDK how that'd work) and then try to figure out what's 'examined enough' for my needs from that. And I think I'd do the same if there's a third attribute: would we expect a researcher to just assume that whatever we mean by 'examined' ("skinned and did not die from hanta....") is whatever they mean? I keep coming around to the idea that SOMETHING about methods is somehow wonky but then I get lost....

"Examined for" allows for capture of examinations or screening events that we don't have results for yet.

Yes - but ??? what happens when they don't actually screen for the thing that you might eventually list as [not]detected, (because they lost the sample, moved to a better test, etc., etc.) and ?? I guess I kinda get it because reality and all, but yikes.

we have been using this model for quite some time

Elaborate? I think maybe I'm missing some large piece of the picture??

not the ideal model

In part, I'm trying to understand why you'd think that. Given unlimited resources, what would the data model look like? We don't have unlimited resources but maybe we can get the really critical (and simplest, perhaps) bits right, or at least not mangle anything such that it couldn't be moved to an ideal model without loss.

Anyway - thanks, I understand a bit more than I did a while ago.

campmlc commented 1 year ago

From #6042 We are trying to combine at least two different search types and protocols in this single set of attributes - 1) very specific tests for very specific pathogens using very specific methods; and 2) general searches for things such as helminths that may or may not yield a wide variety of taxa in results. The Detected/Detected + NonDetected method is technically feasible for the former, but only if researchers understand how these data were captured and used. It doesn't work for the second option, because everything that could possibly be not detected would have to be explicitly entered in order to capture all the negative values. Using Examined for and Detected or Not Detected works for both options, and the data are captured in the format that any parasitologist would understand

dustymc commented 1 year ago

See https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1491987768

Someone's waiting to use this, I can't act until https://github.com/ArctosDB/arctos/issues/6036 https://github.com/ArctosDB/arctos/issues/6037 https://github.com/ArctosDB/arctos/issues/6038 - @ArctosDB/arctos-code-table-administrators

campmlc commented 1 year ago

I believe this has been completed, see https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_code_tables. Closing.

ArctosDB / arctos

Code Table Request - examined/detected/not detected attributes #5688