ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Best way to record if sample was Test for Presence #1461

Closed KyndallH closed 2 years ago

KyndallH commented 6 years ago

For environmental samples I'm loading, attribute "tested for presence" has been added to attributes. As it stands, what it was tested for is going in the remarks. Unsure how well this information will be searchable.

screen shot 2018-02-28 at 12 29 54 pm

From Mariel: This is the same issue we have with parasite prevalence. In order calculate, you need to know two things: Was the organism examined for parasites = y/n, and was a parasite found y/n. A simple yes/no cannot be used with only a single attribute. Using present/absent or positive/negative allows you to combine both attributes into one. I had thought at one point that we would have the ability to put the "tested for" organism into a drop-down or even better, select a taxon embedded in higher taxonomy. This would be much preferable. Putting critical info like this in remarks is a sure way to lose data to misspellings, alternate spellings, what have you. If want to search for a host tested for plague, do I look in remarks for plague or Yersinia pestis or Y. pestis or bacteria or .... all of the above? We already have a model for this in parasite/hosts. We should make this compatible with that model, not reinvent the wheel.

I agree with Mariel, it would be awesome to make it a drop-down menu/code table though I could see it getting out of control. Not only would you have taxons but add in chemicals, diseases, parasites. Not sure if a dropdown of all of it would be feasible.

dustymc commented 6 years ago

There's some common ground with parasites, but it's not a perfect overlap - nobody much cares about the "host" puddle, for example (probably??). I think it all fits in the same model (whatever that is!), but I'm not 100% certain at this point.

Uncontrolled text is not usefully searchable, ever.

I don't see much value in a taxon name. (Was that L. sylvatica before it was broken into 900 species, or after 600 of them were merged back into 150, or ..... ?? And see http://arctos.database.museum/name/Echidna - a name can refer to a LOT of organisms.) Names need context - L. sylvatica on DATE as understood by AUTHOR sensu PUBLICATION etc. is mostly unambiguous. Including those data is no problem when the test is positive (eg, when the data are "found a frog, evidence at the end of this link"), and maybe that ambiguity is not very important when you didn't find a frog?

Arctos taxonomy is not limited to biology, so not much problem to add chemicals to the formal structure. Diseases and parasites are just more biology.

https://github.com/ArctosDB/arctos/issues/1410 is a similar discussion.

campmlc commented 6 years ago

Well, parasitologists care about the host puddle, quite a bit, actually. That is why we created the host catalog -we are the only parasite collection database in the world that enables searches by higher taxonomy of both host and parasite at the same time, for example, here is one just used in a talk "Find all fleas (Siphonaptera) that are positive for plague and are parasites of squirrels (Sciuridae)". I could even add "and have a GenBank record" or "and were collected in New Mexico 1900-2000". This is what makes Arctos a research grade database and not just a collection management tool.

So being able to integrate both the host/parasite angle ("squirrels with fleas, schistosomes of birds etc) and the virus/bacterium/pathogen/toxin/pollutant/co-occuring species angle (hantavirus, plague, chytridiomycosis, etc) at a taxonomic level would be utterly amazing. For that,we need the taxon name. Just because it may not be unambiguous does not make it unusable - same issue with any taxonomy. On the contrary. it makes our database that much more powerful.

On Wed, Feb 28, 2018 at 6:00 PM, dustymc notifications@github.com wrote:

There's some common ground with parasites, but it's not a perfect overlap

  • nobody much cares about the "host" puddle, for example (probably??). I think it all fits in the same model (whatever that is!), but I'm not 100% certain at this point.

Uncontrolled text is not usefully searchable, ever.

I don't see much value in a taxon name. (Was that L. sylvatica before it was broken into 900 species, or after 600 of them were merged back into 150, or ..... ?? And see http://arctos.database.museum/name/Echidna - a name can refer to a LOT of organisms.) Names need context - L. sylvatica on DATE as understood by AUTHOR sensu PUBLICATION etc. is mostly unambiguous. Including those data is no problem when the test is positive (eg, when the data are "found a frog, evidence at the end of this link"), and maybe that ambiguity is not very important when you didn't find a frog?

Arctos taxonomy is not limited to biology, so not much problem to add chemicals to the formal structure. Diseases and parasites are just more biology.

1410 https://github.com/ArctosDB/arctos/issues/1410 is a similar

discussion.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1461#issuecomment-369438160, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hLY44IXlrNMs4QYvQtnJcJcvUQ5nks5tZ0gZgaJpZM4SXkRH .

dustymc commented 6 years ago

parasitologists care about the host puddle,

Yes, I know, I was presuming the frog DNA folks don't - but the water temp, conductivity, etc. attributes may mean that they in fact do??

integrate

We can, it's just that the interfaces are hard. My favorite query (http://arctos.database.museum/SpecimenResults.cfm?&collection_id=27&related_term_val_1=canis) is easy enough to re-write as "things which NCBI thinks are Dipylidium parasitizing things which ITIS thinks are Vulpes velox" (and drag in IDs from both ends and etc - if you can or could click links from the specimen record and get there, we can query it on either end.) Those things are only ambiguous if someone's made them that way (eg, by not including a sensu publication - which is of course most everything).

The ambiguity really only becomes difficult to avoid at "looked for cestodes, didn't find any" - when there's not a parasite cataloged, or when they looked for Lithobates DNA in the puddle and came up empty. Then I have to guess what you meant by "cestode" (and maybe "Cestoad" if you're typing). Not ideal, but perhaps tolerable.

dustymc commented 6 years ago

I'm wondering if we can just catalog a frog, enter this stuff (including IDs and all that jazz), and somehow (maybe with something in parts and/or IDs???) announce that it is in fact an anti-frog when the detection is negative. This feels like it's become more complicated than it needs to be - eg, the filter paper will not have a complex ecological context, I'm not sure it needs it's own record.

Entering things because they don't exist may be blasphemy, but with chemical detection it seems important to be precise in what we mean by "frog" (it's entirely "has THAT sequence," no?) and I'm not sure I see another way.

??

EDIT: I made an anti-frog: http://arctos-test.tacc.utexas.edu/guid/MSB:Mamm:257587

screen shot 2018-03-01 at 8 16 27 am
campmlc commented 6 years ago

Well, that is similar to what we did with hosts and parasites - except the equivalent here would be cataloging the frog puddle and the frog. We have in Arctos collections real parasites with hosts that are observations, and real hosts with parasite observations.. So in this similar case we would have a frog observation (or actual DNA?) cataloged without a "host" at all. But we document the presence of a particular taxon (embedded in a searchable classification hierarchy) at a particular place at a particular time as determined by a particular agent, by particular methods that can be replicable, and linkable to publications and GenBank sequences - which is ultimately our goal in hosting these data, correct?

On Mar 1, 2018 9:07 AM, "dustymc" notifications@github.com wrote:

I'm wondering if we can just catalog a frog, enter this stuff (including IDs and all that jazz), and somehow (maybe with something in parts and/or IDs???) announce that it is in fact an anti-frog when the detection is negative. This feels like it's become more complicated than it needs to be

Entering things because they don't exist may be blasphemy, but with chemical detection it seems important to be precise in what we mean by "frog" (it's entirely "has THAT sequence," no?) and I'm not sure I see another way.

??

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1461#issuecomment-369640879, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hLD-lHAyhpVRrHZaLgI8_a7k9AICks5taBy4gaJpZM4SXkRH .

campmlc commented 6 years ago

And as for the negative "puddles", we have also cataloged negative hosts that are observations only, in order to track prevalence.

On Thu, Mar 1, 2018 at 4:54 PM, Mariel Campbell campbell@carachupa.org wrote:

Well, that is similar to what we did with hosts and parasites - except the equivalent here would be cataloging the frog puddle and the frog. We have in Arctos collections real parasites with hosts that are observations, and real hosts with parasite observations.. So in this similar case we would have a frog observation (or actual DNA?) cataloged without a "host" at all. But we document the presence of a particular taxon (embedded in a searchable classification hierarchy) at a particular place at a particular time as determined by a particular agent, by particular methods that can be replicable, and linkable to publications and GenBank sequences - which is ultimately our goal in hosting these data, correct?

On Mar 1, 2018 9:07 AM, "dustymc" notifications@github.com wrote:

I'm wondering if we can just catalog a frog, enter this stuff (including IDs and all that jazz), and somehow (maybe with something in parts and/or IDs???) announce that it is in fact an anti-frog when the detection is negative. This feels like it's become more complicated than it needs to be

  • eg, the filter paper will not have a complex ecological context, I'm not sure it needs it's own record.

Entering things because they don't exist may be blasphemy, but with chemical detection it seems important to be precise in what we mean by "frog" (it's entirely "has THAT sequence," no?) and I'm not sure I see another way.

??

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1461#issuecomment-369640879, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hLD-lHAyhpVRrHZaLgI8_a7k9AICks5taBy4gaJpZM4SXkRH .

KyndallH commented 6 years ago

We are not cataloging my DNA extraction and filter paper as a wood frog. No. The idea is that the DNA sample/specimen from a particular location is the specimen. Think of testing the specimen for wood frog DNA equivalent as testing it for rabies. We are not going to create a new record everytime someone tests these samples for bacteria, chemicals, etc. I want a way to record that this specimen/sample was tested for various things but I do not want more catalog records. If we do that, I'm going to have to create a UAM:bacteria collection. :/

campmlc commented 6 years ago

OK, so this would be the same as a soil sample that is extracted and blasted for hundreds of possible bacteria/fungal/protist DNA signatures?

If the DNA is positive for wood frog, is there DNA sequence data that would need to be associated?

So the specimen would not itself be a biological organism, e.g. like ethnological or geological samples/specimens, but it needs to be associated with biological taxa?

On Thu, Mar 1, 2018 at 5:13 PM, Kyndall notifications@github.com wrote:

We are not cataloging my DNA extraction and filter paper as a wood frog. No. The idea is that the DNA sample/specimen from a particular location is the specimen. Think of testing the specimen for wood frog DNA equivalent as testing it for rabies. We are not going to create a new record everytime someone tests these samples for bacteria, chemicals, etc. I want a way to record that this specimen/sample was tested for various things but I do not want more catalog records. If we do that, I'm going to have to create a UAM:bacteria collection. :/

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1461#issuecomment-369776398, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hL1sHDhvXnomazEteUi4_zcBgQ44ks5taI69gaJpZM4SXkRH .

dustymc commented 6 years ago

negative hosts

Example please.

A cataloged item is nothing more than a convenient place from which to hang things like taxonomy and links to genbank. I'm not sure we need a UAM:Bacteria collection, but we may need a UAM:ThingsThatLeaveDNAInPuddles collection....

If it's got DNA, I'm pretty sure there's a biological organism involved somewhere. (And DNA is better evidence of occurrence than a lot of other stuff that gets cataloged!)

equivalent as testing it for rabies

I'm not sure what ACTUALLY happens with that, but it seems a lot like "tested for tapeworms" to me and I'm not sure why it should be handled differently.

not going to create a new record everytime

Why not? What if I could make doing so as simple as entering a taxon and clicking a button? (One thing we talked about for #1410.)

@mlbowser created http://arctos.database.museum/guid/UAMObs:Ento:235210 which may be similar - Matt, anything to share?

campmlc commented 6 years ago

I think we should create an AWG sub-committee on environmental samples to discuss this, and potentially add to GGBN grant for development.

Example of host observation that was negative for parasites:

https://arctos.database.museum/guid/MSB:Host:823

Host observation positive for parasite observation - that latter turned out to have a real host voucher specimen in existence:

https://arctos.database.museum/guid/MSB:Host:822

On Thu, Mar 1, 2018 at 5:47 PM, dustymc notifications@github.com wrote:

negative hosts

Example please.

A cataloged item is nothing more than a convenient place from which to hang things like taxonomy and links to genbank. I'm not sure we need a UAM:Bacteria collection, but we may need a UAM:ThingsThatLeaveDNAInPuddles collection....

If it's got DNA, I'm pretty sure there's a biological organism involved somewhere. (And DNA is better evidence of occurrence than a lot of other stuff that gets cataloged!)

equivalent as testing it for rabies

I'm not sure what ACTUALLY happens with that, but it seems a lot like "tested for tapeworms" to me and I'm not sure why it should be handled differently.

not going to create a new record everytime

Why not? What if I could make doing so as simple as entering a taxon and clicking a button? (One thing we talked about for #1410 https://github.com/ArctosDB/arctos/issues/1410.)

@mlbowser https://github.com/mlbowser created http://arctos.database.museum/guid/UAMObs:Ento:235210 which may be similar - Matt, anything to share?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1461#issuecomment-369782338, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hO7iijkWEjOBXyv1sTD0NhnBnQIcks5taJadgaJpZM4SXkRH .

KyndallH commented 6 years ago

UAM:ThingsThatLeaveDNAInPuddles collection = UAM:Env:

OK, so this would be the same as a soil sample that is extracted and blasted for hundreds of possible bacteria/fungal/protist DNA signatures? Exactly! :)

So the specimen would not itself be a biological organism, e.g. like ethnological or geological samples/specimens, but it needs to be associated with biological taxa? Exactly - it is DNA from water samples

dustymc commented 6 years ago

@campmlc https://arctos.database.museum/guid/MSB:Host:823 is "the puddle" in which nothing was found. There's reason to believe it existed, even if you don't have it. I'm not sure how useful it might be for prevalence - Rausch looked, somehow, for something, and didn't find it (and presumably that's all he recorded).

If he'd have looked for Taenia using {very specific technique} you could use those data for prevalence, if you had a useful way of saying (and possibly circumscribing) Taiena. The only way I see to do that is by creating a related specimen, one which Rausch claims does NOT exist. I don't really like that, I just don't see another path to the data ya'll say are important.

Maybe just a project? "We used {technique} which can detect {giant list of species}. The lack of a relationship indicates we didn't find any of those species."

?????????

An AWG sub-committee is a great idea. I'll start a Project.

dustymc commented 2 years ago

I think better resolution is going to involve someone(s) adopting https://github.com/ArctosDB/arctos/projects/7#card-7829346; tentatively tabling.