Closed Jegelewicz closed 1 year ago
Known shortcomings
What have I forgotten?
Perhaps if these values are quite coarse, something like https://github.com/ArctosDB/arctos/issues/5101 could be used to describe things in more detail? However, associating that with the above may not be easy.
- Perhaps we also need "not examined for"?
Is this just so that ecto/endoparasite examination = no can be migrated properly? I think this makes sense so we aren't losing any data, but I don't see this term getting much use going forward since there are a lot of things that most specimens will never be examined for.
In general, everything you've outlined will work nicely for UMNH because we have very little parasite data so far and, to my knowledge, no pathogen screening data.
Is this just so that ecto/endoparasite examination = no can be migrated properly?
Maybe? I am not sure if the "No" answers matter there to anyone - but maybe they do...
Calculation of parasite or pathogen prevalence is number of infected individuals / total number examined x 100, eg the percent infected of those examined. That's why we have the negative " not examined" to explicitly exclude those. This is especially useful at the expedition or project level, where you can record and easily find which specific individuals were examined in order to download the correct data for assessing parasite load and prevalence. That said, I'm trying to wrap my brain around whether we need a separate attribute for this, or whether we just leave the examined for blank if no data are present. Right now the existing attributes "examined for parasites" ( and ectos and endos) are yes/ no options, so a good question would be how many " no" values are there in current use.
That's why we have the negative " not examined" to explicitly exclude those.
I just don't see how anyone outside of MSB would know which group of whatever "hosts" should be included in any given "prevalence" assessment. I guess using collection places and dates would suffice, but what would someone make of a record that falls in their place/time constraint but has NONE of these attributes?
leave the examined for blank if no data are present.
Why add it at all - this seems like it would just create even more confusion?
how many " no" values are there in current use.
ectoparasite exam = no 10,800 records
endoparasite exam = no 15,865 records
What relevant questions cannot be answered with two attributes?
Apparently
Alternate proposal which keeps method (and agent and date) coupled:
one attribute: examination
values
examined for and found A examined for and did not find B found without examination B
so - not what I said in the meeting, but rather THREE values for every "thing"
This accommodates
one DATE PERSON using METHOD1 looked for and didn't find A one DATE PERSON using METHOD2 looked for and didn't find A one DATE PERSON using METHOD3 looked for and did find A (... etc for every B and C and...)
(through 3 attributes).
The Good: there's no ambiguity, no gulfs to try to cross, it's all explicit assertions
The Bad: value code table is three times as big as one might expect
@campmlc doesn't think ^^ is usable
Gabor proposal
2 attribtues
These would share a 'value' code table, with 'endos' and 'ectos' (and such and subdivisions of)
Pros: easy, intuitive (I think??) Cons: slightly loose link between looked and found - need good procedures to keep date consistent and etc.
USAGE:
looked and found one thing - two attributes looked and didn't find - 'look' attribute only found without looking - 'found' attribute only
Use
examined for - A systematic examination for the attribute value was made on the determined date using the determination method. detected - The attribute value was detected on the determined date using the determination method.
Then, need a code table for values. @campmlc to start a GoogleSheet
And we need good documentation on how to use these and when to move on to separate cataloging of what was detected
NEED: documentation
I propose the wording for the following attributes "Examined for" "Detected" "Not Detected" Draft Code Table values: https://docs.google.com/spreadsheets/d/1bboO2LLBd1D26ykbomQvIFU9ocOH9L9V0lBqCSipaBc/edit?usp=sharing
@campmlc REALLY wants to include
not detected - The attribute value was not detected.
And the one thing we could do with that is a data quality check. If there is no "detected" or "not detected" attribute with the same determination date as "examined for", then something is missing. If we leave off "not detected", this would not be possible. For the group - what do you prefer, the two
examined for detected
or three term method?
examined for detected not detected
@campmlc @gracz-UNL @kderieg322079 @bryansmclean @adhornsby @ehalverson26 @jldunnum
not detected - The attribute value was not detected.
Given one record with a negative assertion and another thousand with what was proposed by @gracz-UNL (my interpretation in https://github.com/ArctosDB/arctos/issues/5688#issuecomment-1458578242), I think users will inevitably and incorrectly assume that the one sets the pattern and the thousand aren't of interest.
I don't believe that "we" are in any way capable of maintaining a comprehensive list of the things we didn't find. I see no plausible way in which these data can be managed, and a great and inevitable risk of them being misunderstood. I believe the current data support this hypothesis.
Simply asserting what we know - what Gabor has proposed - seems like something that we can easily manage and that users can easily understand.
Draft Code Table values:
That (eventually) will need formatted to comply with the structure, which is a term and definition. My suggestion would be to structure the values like:
which should serve as a sort of built-in category and make things sort nicely, but that's certainly not the only possibility.
Hi @Jegelewicz @dustymc and @campmlc et alia! We have two immediate needs for these types of attributes in Arctos. We have hundreds of specimens that were examined for malaria and in some cases, malaria was found. We also are getting back a loan for some pathogen analyses and only about 1/3 are positive for one of the two diseases. For both instances the three-term method that @Jegelewicz outlines above works great. Please, make sure that there's a field for agent determiner and date, as well as a remarks field for methods or links to media (such as a report we get from USFWS). this is great!
three-term method
Can you elaborate on that? I still can't understand what question "not detected" can answer that "examined" and (the absence of) "detected" cannot.
I propose the wording for the following attributes "Examined for" "Detected" "Not Detected" Draft Code Table values: https://docs.google.com/spreadsheets/d/1bboO2LLBd1D26ykbomQvIFU9ocOH9L9V0lBqCSipaBc/edit?usp=sharing
@campmlc and @dustymc I added a comment/description for chytridiomycosis but it would probably be best to split this up or have a field to specify it separate from method.
three-term method
Can you elaborate on that? I still can't understand what question "not detected" can answer that "examined" and (the absence of) "detected" cannot.
@dustymc , if we don't have that term ("not detected") then we end up with the same problem we currently have for sex determination for specimens. "Undetermined" is not the same when you examined a specimen for sex and can't determine it (as in, the gonads are missing, an external examination is impossible because the business end is missing, etc.) vs. data is missing because it was never recorded, nobody ever bothered, but someone decided to use the field anyways. Disambiguation is key! I just need to be able to capture both positive and negative results from a USFWS report for a bunch of specimens. Maybe it can be solved with "examined for" as one variable, a table of possibilities "endoparasite, etc." and "yes/no", with a determiner, date, method, remarks, etc.
Disambiguation is key!
On this we are agreed!
To expand on the sex analogy, we don't do there what's been suggested here. If this was that, I'd propose the following states
"we looked (using whatever methodology is listed) and found no results" (so maybe the critter doesn't have any of the traits we test for, or maybe our methodology isn't great)
"it's a male, but we just stumbled on this"
"we looked and found it to be a male."
So the first and last are good for prevalence and such (they have 'examined' and either result or NULL), and the middle is not (there was no systematic examination).
We DO NOT need the fourth state (third assertion), which would necessarily either be incomplete (and so confusing), or a listing of everything we know how to examine for:
.... and then incomplete (and so confusing) when we inevitably add another value.
https://github.com/ArctosDB/arctos/issues/5759 is the UI adjustment necessary to ask questions of the two-attribute model.
We need the not detected third attribute for at least two essential reasons. 1) To explicitly capture data on those specimens that gave a negative test result for a particular date,determiner, and method. It is possible this same specimen might give a positive result for a different method etc. We have this explicit level of data to record somewhere - we need the attribute to say that two thirds of the specimens examined for a particular virus etc explicitly had a negative test result according to some test and method. 2) We usually do not have test results at time of cataloging. We can record the "examined for" data at that time. But the "Detected" values, both negative and positive, may not be available for months and will need to be added via an attribute bulkload. We need to be able to distinguish negative values that just haven't gotten results back yet (eg "Detected" attribute not present) from those records that explicitly have negative test results (" Not Detected" attribute present). I've given up on asking for a Not Examined attribute, even though we record these data and that will now have to be shoved into remarks somewhere. But we cannot compromise on requiring a Not Detected attribute. @jldunnum
@dustymc i understand the logic for the sex attribute you propose and maybe that's how it should be but at least we have a 'undetermined' field where we can put remarks on why that is. In this instance, as @campmlc points out, sometimes the results of examinations are not back for weeks or months. It all boils down to how the data can be effectively used and/or downloaded. I don't understand the database cost or the necessity of not storing or hiding a negative test result.
When you take a COVID test, it's either positive, negative or inconclusive. If I call the pharmacy a few hours later, the clerk doesn't say...."well, it just says you took a test and it's not marked in another attribute as positive so you probably don't have COVID, our database doesn't keep track of whether it's negative or inconclusive, that would be confusing to the programmers."
It is possible this same specimen might give a positive result for a different method
Yes, I think there's a gap around that link, and maybe I suddenly understand it: Perhaps we're just underutilitzing method?
I don't quite knwo how to structure that, but it doesn't involve two THINGS that need to be coupled where there's no available coupler.
at time of cataloging
I don't think that changes anything, however the rest works out??
From meeting today.
Go with three attributes as described in initial comment.
@jldunnum to provide data @dustymc to set up for review
I will help however needed - just let me know.
@Jegelewicz can we plan to connect next Tues or whenever the test is finally implemented? Maybe we could all view the functionality together?
Invite sent for meeting on the 28th!
@Jegelewicz can you open issues for the new attributes and the CT (with at least one value)? Even if we rush things and understand that this is a bit experimental, it'll be useful to have that supporting info.
@Jegelewicz is there an Issue for the value code table? I know(ish) what the contents will be, but I don't know what to name the table+value-column and I think that needs sorted out before the three attribute-issues can be resolved.
From today's meeting:
@jldunnum and @bryansmclean plan to provide test data @dustymc will create the new code table @Jegelewicz will create the new attributes and associate them with the code table (after the boxes there are checked) @campmlc @jldunnum @bryansmclean to start issues for new code table terms needed for test data
The plan is to have some stuff to review in two weeks - a meeting invitation has been sent.
A reminder for @Jegelewicz
https://arctos.database.museum/guid/UTEP:ES:25-566
has a pathology of sorts in condition - digestion
could use pathology and put this in remark or request Pathology: digestion - the object has attributes that suggest it has passed through the digestive tract of another organism.
There are probably more of these - so do a search and fix them all if this happens.
I'm moving some scattered discussion back here. I still don't understand the need for the third attribute, whatever form the two remaining take, and others - do, I think?? I am NOT trying to argue that I haven't confused myself, that does seem possible, but I'm going to be asked to query this and I need to understand it. (And anyone who's going to be shaping data to it does as well).
I got a consultation from our resident mathematician, she can't find obvious problems in my theory (as presented with all my biases, so.....).
She did note that the third factor would require TWICE as much data from the 'testers' and TWICE and much entry, all with scrupulous consistency.
Around https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1487839100 @campmlc says the third attribute is necessary to calculate prevalence.
With two attributes, that is the ratio of tested positives to total tested:
"detected: thingA" / ("not detected: thingA" + "detected: thingA")
That still needs a bit of "method filtering" to reject the incidentals, but it's a filter, not a join, and it's applied to a lot less data.
I still can't wrap my head around the three-factor approach, but I believe that would require the same query but use 'tested: thingA' rather than method to exclude incidentals, but then it requires a JOIN or match across multiple factors - agent, date, method was mentioned several times - to correlate the tested and results attributes. (And I suspect any rigorous analysis is going to want to filter on method anyway, which would - I think - accomplish this in a simpler system.)
Prevalence=
detected: thingA
WITHtested: thingA
WHERE methods match / ( (detected: thingA
WITHtested: thingA
WHERE methods match) + (not detected: thingA
WITHtested: thingA
WHERE methods match))
and then somehow also make the multifactor methods match between
??????????????
HELP!!
I will not ever be using this data, although at some point I may end up entering it. The above makes sense to me though and reduces the number of attributes required.
The only argument for "examined for" that I have heard that makes sense is that a test has been initiated, but no results are available, so we can at least know the exam is in process. But is that reality? Or is reality more like what we have with the set of data from @jldunnum where the testing has occurred and the results are known and we are recording everything at once?
Using that data, for every catalog record we could record "detected" or "not detected", depending upon the results and then prevalence (for that study) could be calculated with the formula @dustymc gave.
"detected: thingA" / ("not detected: thingA" + "detected: thingA")
My objection to all of this is that without extreme consistency in date, determiner, and method, only those with certain knowledge about this particular study could ever re-create that prevalence number (a project might make that better, but not everyone understands Arctos projects and I doubt anyone downloading data would figure it out). I truly believe that this kind of data needs to be stored elsewhere in such a manner that allows it to stand alone as research results and that what we should be doing in Arctos is providing a link from the associated records to the external data. This is the entire concept behind the digital extended specimen. That being said, if no such external source exists (I don't believe this to be true, but let's say it is), then the two attributes do seem to be sufficient.
I've said my piece - the community can decide.
From @jldunnum in https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1489116320:
I think having the "Examined for" will allow for more comprehensive returns of all records we are interested in on a given search and will require a simpler query. I can search "examined for" Virus: Orthohantavirus from a place or time or taxon and get all records that have been screened. In the "Detected" and "Not detected" model I would need to search for virus: Orthohantavirus in both those fields and any variation in the methods, dates, agents would cause me to miss records. "Examined for" can also return a broader array of detected things because we wouldn't have to specify the exact detected value. I think its better to get potentially more than I want and have to filter than to potentially miss anything. "Examined for" allows for capture of examinations or screening events that we don't have results for yet.
Again I think we all understand this is not the ideal model and dedicated modules would be better but until then we have been using this model for quite some time now and I think its been working for those who use it.
I can search "examined for" Virus: Orthohantavirus from a place or time or taxon and get all records that have been screened.
Yep.
In the "Detected" and "Not detected" model I would need to search for virus: Orthohantavirus in both those fields
yep
and any variation in the methods, dates, agents would cause me to miss records.
Maybe? I'd just pull both (and maybe we need some sort of "this OR that" attribute search, which might also mitigate the above - but IDK how that'd work) and then try to figure out what's 'examined enough' for my needs from that. And I think I'd do the same if there's a third attribute: would we expect a researcher to just assume that whatever we mean by 'examined' ("skinned and did not die from hanta....") is whatever they mean? I keep coming around to the idea that SOMETHING about methods is somehow wonky but then I get lost....
"Examined for" allows for capture of examinations or screening events that we don't have results for yet.
Yes - but ??? what happens when they don't actually screen for the thing that you might eventually list as [not]detected, (because they lost the sample, moved to a better test, etc., etc.) and ?? I guess I kinda get it because reality and all, but yikes.
we have been using this model for quite some time
Elaborate? I think maybe I'm missing some large piece of the picture??
not the ideal model
In part, I'm trying to understand why you'd think that. Given unlimited resources, what would the data model look like? We don't have unlimited resources but maybe we can get the really critical (and simplest, perhaps) bits right, or at least not mangle anything such that it couldn't be moved to an ideal model without loss.
Anyway - thanks, I understand a bit more than I did a while ago.
From #6042 We are trying to combine at least two different search types and protocols in this single set of attributes - 1) very specific tests for very specific pathogens using very specific methods; and 2) general searches for things such as helminths that may or may not yield a wide variety of taxa in results. The Detected/Detected + NonDetected method is technically feasible for the former, but only if researchers understand how these data were captured and used. It doesn't work for the second option, because everything that could possibly be not detected would have to be explicitly entered in order to capture all the negative values. Using Examined for and Detected or Not Detected works for both options, and the data are captured in the format that any parasitologist would understand
See https://github.com/ArctosDB/arctos/issues/6042#issuecomment-1491987768
Someone's waiting to use this, I can't act until https://github.com/ArctosDB/arctos/issues/6036 https://github.com/ArctosDB/arctos/issues/6037 https://github.com/ArctosDB/arctos/issues/6038 - @ArctosDB/arctos-code-table-administrators
I believe this has been completed, see https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_code_tables. Closing.
Goal Allow users to find catalog records that include:
Provide a list of something to choose from.
Context Reduce the proliferation of "examined for" and "detected" possibilities and create a structure for this type of data that allows for capture at a coarse level. More detailed information likely belongs elsewhere and should be linked to Arctos OR requires some sort of "module" outside of the current data structure to capture it well.
Table Collection Object Attribute: Types
Proposed Value
examined for detected not detected
Proposed Definition
examined for - A systematic examination for the attribute value was made. detected - The attribute value was detected. not detected - The attribute value was not detected.
Collection type potentially all
Attribute data type categorical
Attribute value Requesting a new code table: Attribute: Examined For - to begin with the following terms:
Attribute units [ For number+units attributes, code table controlling units ]
Available for Public View Yes
Priority [ Please choose a priority-label to the right. ]
Note: There would be a migration path for the following attributes, which would then be removed from the code table
See also proposal in https://github.com/ArctosDB/arctos/issues/5087