ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Split "Observation" event type with new collecting sources #2075

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 5 years ago

See https://github.com/tdwg/dwc-qa/issues/134#issuecomment-491878996

Suggest we have collecting sources =

machine - evidence of observation captured by machine (camera, sound recording, etc.) human - evidence of observation captured by a human (field notes)

dustymc commented 5 years ago

My first impression is that this is going to lead to endless debates and inconsistent data.

I would 100% call a person with a camera a 'human observation' - they saw the critter-or-whatever (so have more information than what's captured), the photo (recording, whatever) is just supporting evidence. That's obviously not a universal viewpoint, and I suspect it's just one of many examples where some users would choose one of these values and some would choose the other (which means they're not useful for users).

I suggest our two-part approach:

  1. collecting source remains as-is to provide the big picture (something was observed and not sampled)
  2. Attribute 'establishment means' (which may already need a new name...) as proposed in https://github.com/ArctosDB/arctos/issues/1942 contains the details ('machine' + 'human' or perhaps even a finer-scaled split).
Jegelewicz commented 5 years ago

I suggest our two-part approach:

collecting source remains as-is to provide the big picture (something was observed and not sampled)
Attribute 'establishment means' (which may already need a new name...) as proposed in #1942 contains the details ('machine' + 'human' or perhaps even a finer-scaled split).

I'm good with that. Do we have "establishment means" yet?

dustymc commented 5 years ago

Do we have "establishment means" yet?

No, but I can create it in a few hours - I think we're all good with that in #1942. Given the possibility of this use case, are we still happy with the name?

AJLinn commented 5 years ago

are we still happy with the name?

I for one, have no idea what that means... but as long as you write good documentation, I'll figure it out.

Jegelewicz commented 5 years ago
are we still happy with the name?

I for one, have no idea what that means... but as long as you write good documentation, I'll figure it out.

@AJLinn for us biological collection people, it means "The process by which the biological individual(s) represented in the Occurrence became established at the location."

Is there something similar in your world? We can use another term if it encomapsses something you need too!

dustymc commented 5 years ago

it means....

IF my idea to split 'observation' into machine and human using the attribute intended to refine collecting source (but perhaps not quite in this way) isn't too far out, that definition is too restrictive and the name may be as well.

Take a picture of a critter:

I hate to be the one to suggest even more Attributes (https://github.com/ArctosDB/arctos/issues/1623) but maybe my idea is too much overload for this and we need yet another way of refining collecting source?? Maybe this isn't related to collecting source at all? Maybe it really is a fundamental split in collecting source and we just need better human/machine definitions? Does this change if we lose the camera trap image - eg, is this just parts??

Wheeeee!!

Jegelewicz commented 5 years ago

I hate to be the one to suggest even more Attributes (#1623) but maybe my idea is too much overload for this and we need yet another way of refining collecting source??

Who are you kidding - you LOVE to make messes. :-)

Maybe this isn't related to collecting source at all?

Given what John W. said in the TDWG issue, my first instinct was to split the "observation" event type.

Maybe it really is a fundamental split in collecting source and we just need better human/machine definitions?

But maybe we should start with defining Collecting_Source. I don't see a definition at http://handbook.arctosdb.org/documentation/specimen-event.html and this is probably why we are having so much trouble with it. Creating a separate issue.

Does this change if we lose the camera trap image - eg, is this just parts??

Having a "media" part would possibly make an observation "machine", except when the media is only a scan of field notes...so I say this has nothing to do with parts.

Jegelewicz commented 5 years ago

BTW - we should also consider where the DwC term "basis of record" comes from in Arctos. I am not sure that I can answer that question. Most stuff shows up in GBIF as Preserved specimen. What magic happens to get that information out of an Arctos record, which as far as I know doesn't specify that information anywhere? It can possibly be inferred from event type + parts. I can't quickly find an example of an observation from Arctos in GBIF.

dustymc commented 5 years ago

messes

I don't, I swear!

captured by a human (field notes)

A recording of a bird singing is a machine observation - and the bits where the person holding the recorder is talking are basically field notes....

Having a "media" part would possibly make an observation "machine", except

Many of those bird recordings end with BLAM! and a specimen..... I suppose that should really be two Events a second or so apart, but I don't think anyone works at that precision.

This just all seems too arbitrary. How about from the other side: who cares about this stuff? Why do we want to separate human and machine "observations"?

dustymc commented 5 years ago

basis of record

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTCATALOGED_ITEM_TYPE

That can be set on the Attributes tab. I wouldn't want to defend the definitions...

Jegelewicz commented 5 years ago

This just all seems too arbitrary. How about from the other side: who cares about this stuff? Why do we want to separate human and machine "observations"?

Why do we want to separate "captive" from "wild"? It's a similar answer - to give people an idea of the reliability of the data for whatever research they happen to be doing. Data supported with a photograph, video or sound recording may be deemed more reliable than 100 year old cursive writing about the yellow bird someone saw while eating dinner at field camp (or perhaps the other way around depending upon the person doing the cursive writing).

Anyway, that's why I demoted the division from event type to collecting source and totally accepted your demotion of it to an establishment means attribute.

Jegelewicz commented 5 years ago
basis of record

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTCATALOGED_ITEM_TYPE

That can be set on the Attributes tab. I wouldn't want to defend the definitions...

HMMMM - so when/where does this get recorded? It isn't part of bulkloaded data or a single entry record and I think it possibly should be?

dustymc commented 5 years ago

That split does seem useful to me, but it also seems to depend on the material we have in hand.

I can't say I'm particularly looking forward to writing the code, but it looks like that could all be derived from event type + parts + disposition.

dustymc commented 5 years ago

HMMMM - so when/where does this get recorded? It isn't part of bulkloaded data or a single entry record and I think it possibly should be?

Looks like I'm making (probably not very defensible) decision from the collection type.

This seems very much like the above to me.

http://arctos.database.museum/guid/MVZObs:Mamm:10 is an "observation" from which you could get DNA

We probably have 'observations' recorded in 'real' collections.

We certainly have specimens in 'real' collections for which we can't find parts (eg, they are functionally observations).

Etc.

It's not much problem (for me - not sure about those of you who have to use it) to add this to the bulkloader, but I'm also not sure it does what I think it's intended to do.

albenson-usgs commented 3 years ago

I didn't read the full thread (apologies!) but recently in the TDWG Machine Observations group we have settled on a definition for MachineObservation vs HumanObservation

Jegelewicz commented 3 years ago

@albenson-usgs so, if it is on a schedule it is human and if automated (motion sensor) it is machine?

tucotuco commented 3 years ago

Please note that none of this is from the normative Darwin Core definitions.

On Thu, Apr 22, 2021 at 7:01 PM Teresa Mayfield-Meyer < @.***> wrote:

@albenson-usgs https://github.com/albenson-usgs so, if it is on a schedule it is human and if automated (motion sensor) it is machine?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2075#issuecomment-825212349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723QBPOPIT5EBFOSRHTTKCMEHANCNFSM4HMRGKIQ .

albenson-usgs commented 3 years ago

Well what we came up with is if it's on a schedule then it's a machine (a drone flying over set to take pictures every 30 seconds versus a drone that a person is flying and choosing when to take the picture).

@tucotuco that's good to know. As I said on the Material Sample thread I think basisOfRecord is really confusing for the community I interact with at least and I think a community discussion would help. I know there is the Darwin Core Hour that you did on this but confusion remans even for myself.

dustymc commented 3 years ago

if it's on a schedule then it's a machine

So a detection from a satellite photo is one thing, and a detection using the same method in an identical-resolution aerial photo is something else?

basisOfRecord is really confusing for the community

I completely agree; I think we're spending a lot of time and effort (https://github.com/ArctosDB/arctos/issues/2432, https://github.com/ArctosDB/arctos/issues/3421) creating entirely arbitrary data that doesn't DO anything for anyone.

albenson-usgs commented 3 years ago

So a detection from a satellite photo is one thing, and a detection using the same method in an identical-resolution aerial photo is something else?

I think these are both machine because it's not like the person on the plane says "Wait, there is a really cool tree right there, let's get an aerial image of it." I would think you have a designated flight path and you're recording anything in the path. But maybe this isn't worth hashing out since this isn't the way basisOfRecord is supposed to be applied anyway ¯_(ツ)_/¯

dustymc commented 3 years ago

Wait, there is a really cool tree right there, let's get an aerial image of it

That definitely happens.

designated flight path

Even those are more suggestions because of weather etc. But now I'm thinking perhaps you're suggesting intent rather than the camera platform is the factor?

I (obviously!) don't know what we should be doing, but what we are doing is manually (==arbitrarily) setting https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type, and then sending it to DWC via...

case 
    when CATALOGED_ITEM_TYPE='specimen' then 'PreservedSpecimen' 
    when CATALOGED_ITEM_TYPE='observation' then 'HumanObservation'
    when CATALOGED_ITEM_TYPE='fossil specimen' then 'FossilSpecimen'
    when CATALOGED_ITEM_TYPE='human observation' then 'HumanObservation'
    when CATALOGED_ITEM_TYPE='living specimen' then 'LivingSpecimen'
    when CATALOGED_ITEM_TYPE='machine observation' then 'MachineObservation'
    when CATALOGED_ITEM_TYPE='preserved specimen' then 'PreservedSpecimen'
    else null 
  end basisOfRecord,

My preference would still be to banish the concept from wherever it exists.

tucotuco commented 3 years ago

See tdwg/dwc-qa#134 (comment)

Suggest we have collecting sources =

machine - evidence of observation captured by machine (camera, sound recording, etc.) human - evidence of observation captured by a human (field notes)

This is exactly what is recommended for the Darwin Core terms. Basically, if the evidence comes out of a machine, regardlss of any intention or automation, it is a MachineObservation. Part of the reason for that is to have an anchor for being able to capture metadata about the machine, something that isn't expected for a HumanObservation.

To respond to @dustymc , this is specifically to avoid endless debate, though we both know that inconsistent data has to be solved by a lot of other medicine.

Attribute 'establishment means' (which may already need a new name...) as proposed in #1942 contains the details ('machine' + 'human' or perhaps even a finer-scaled split).

This is not at all what dwc:establishmentMeans is about, if that was the intention.

Given what John W. said in the TDWG issue, my first instinct was to split the "observation" event type.

observation, machine - evidence of observation captured by machine (camera, sound recording, etc.) observation, human - evidence of observation captured by a human (field notes) with no supporting machine-captured evidence

This is already the case with MachineObservation and HumanObservation. There is no "Observation" class in Darwin Core.

BTW - we should also consider where the DwC term "basis of record" comes from in Arctos. I am not sure that I can answer that question. Most stuff shows up in GBIF as Preserved specimen. What magic happens to get that information out of an Arctos record, which as far as I know doesn't specify that information anywhere? It can possibly be inferred from event type + parts. I can't quickly find an example of an observation from Arctos in GBIF.

https://www.gbif.org/occurrence/search?basis_of_record=HUMAN_OBSERVATION&advanced=1&network_key=1f2c0cbe-40df-43f6-ba07-e76133e78c31&occurrence_status=present&hosting_organization_key=2053a639-84c3-4be5-b8bc-96b6d88a976c

albenson-usgs commented 3 years ago

Alright so it seems to me that this has evolved over time and relatively recently given this comment and then this one. If the latter one is the consensus of the community then I don't think it's clear to everyone that that is the decision (note that the TDWG Machine Observations Task Group doesn't know this).

Moreover, I still don't understand what a user would do with this information. How would they use it? If I go to GBIF and download data from multiple datasets and it comes back with 500 occurrences "HumanObservation" and 500 occurrences "MachineObservation" what is it that I can do with that?

I don't think any of this would necessarily be an issue except basisOfRecord is required in the IPT and so people who don't understand how to apply it (me!) are forced to do so and I think the results are unusable. I'd be interested to know why GBIF made this a required term. What was the reasoning? What were we hoping to make clearer about the data by making this a required term?

(Apologies to Arctos for babbling in your thread)

Jegelewicz commented 3 years ago

@albenson-usgs we appreciate the interaction!

See also https://github.com/tdwg/dwc/issues/314#issuecomment-832257304

dustymc commented 3 years ago

@tucotuco at some point we talked about a general statement for things like 'invasive' and a way to refine (what exactly do we mean by "invasive"?) as assertions. I was attempting to suggest that we could take a similar approach here:

For example, "wander around with a yagi antenna, scribble stuff on paper, hope it more or less triangulates" and "get coordinates from GPS" describe a very similar situation in different years. Those might end up in the same DWC-slot, but they're very different kinds of data.

And - SHOULD they end up in the same slot? Nobody generally sees any critters, one "came out of a machine" as beeps, then went onto paper (with a little help from a compass - is that a machine? Would the answer be different for the compass in my watch?), then back into a machine, and the other is machines all the way down. I'd probably put them in different slots, I suppose, but it still seems overly arbitrary to me.

what a user would do with this information

I think it depends on the details. I can verify an identification from PreservedSpecimen, I can't from HumanObservation, for example. (Until I learn that a fair number of those PreservedSpecimen aren't actually available to borrow for various reasons, and are therefore functionally identical to HumanObservation...)

And yes, we appreciate the input @albenson-usgs !

albenson-usgs commented 3 years ago

I can verify an identification from PreservedSpecimen, I can't from HumanObservation, for example.

If the intent for basisOfRecord is that there is some piece of evidence to get back to then that would certainly be clarifying as to how to use it but might create confusion in how to apply it. Your example of a preserved specimen not being available to borrow is a good example. Same if you took a picture but the picture is not stored anywhere- basisOfRecord = "MachineObservation" indicating there could be machine data to get back to but in the case of the picture you can't get back to it. Then I'm back to thinking about how useful this is and to whom. If as a user what I can take away from basisOfRecord (and that's if we can get data managers applying it correctly) is that there could be evidence to get back to but not necessarily then I'm not sure how to use that.

tucotuco commented 3 years ago

I think a slightly more accurate description of the intention of basisOfRecord would be the evidence upon which the Occurrence was established. That would be regardless of what physical evidence might remain in existence, for which the term disposition is meant.

albenson-usgs commented 3 years ago

But disposition, according to the definition, seems to be only intended for specimens. I wouldn't think it would be appropriate to use it for an image?

Jegelewicz commented 3 years ago

I wouldn't think it would be appropriate to use it for an image?

Why not? An image can be missing, destroyed, "in collection" (either a physical copy or digital one), on loan, etc.

albenson-usgs commented 3 years ago

The definition is "The current state of a specimen with respect to the collection identified in collectionCode or collectionID." I would not apply that to an image but maybe that's just me.

tucotuco commented 3 years ago

That definition arose from a Specimen-centric proto-Darwin Core. It can be changed to admit disposition of evidence if the community is down with that. That would avoid a proliferation of terms. I can't see it being terribly controversial, but then, I am repeatedly surprised.

On Fri, May 7, 2021 at 10:32 AM Abby Benson @.***> wrote:

The definition is "The current state of a specimen with respect to the collection identified in collectionCode or collectionID." I would not apply that to an image but maybe that's just me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2075#issuecomment-834392713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72ZRGLC67POWBOH6DRLTMPTX3ANCNFSM4HMRGKIQ .

albenson-usgs commented 3 years ago

It seems to me that an update to the definition for basisOfRecord to what you suggested @tucotuco "the evidence upon which the Occurrence was established." would also be needed as I don't think the current definition "The specific nature of the data record." captures that. I'm also curious why it's a Record-Level term but when publishing via IPT at least it's included in the Occurrence Extension when all the other Record-Level terms are put in the Event Core (sorry- tangent).

mkoo commented 3 years ago

Jumping into this cold with a few thoughts. First to the original suggestion of splitting observation event type now into sources, I think this would introduce a whole mess of issues and debates that in the end, most collections will not touch. There's also potential to have seemingly conflicting inferences with basisOfRecord Also how does that comport with existing samplingProtocol in field work? collectingsource=contraption if using a mistnet or pitfall trap?

I'm more in favor of better definitions and use cases for basisOfRecord since this is also the first order of filtering of records for many researchers. This field has potential for expanded types, is well used and appears to be able to distinguish collecting sources especially when relevant to the record (e.g, camera trap vs human). However that said, there's a lot of ambiguity as well in some of the example controlled vocab on TDGE (eg. Event vs Occurrence?) Can we shore up basisOfRecord ? Perhaps I'm missing a chunk of conversation but what is missing that we need more splitting of hairs here?

tucotuco commented 3 years ago

This topic is deep and of broad interest, but I would like to engage in an attempted consensus here if that sounds reasonable.

On Fri, May 7, 2021 at 12:27 PM Michelle Koo @.***> wrote:

Jumping into this cold with a few thoughts. First to the original suggestion of splitting observation event type now into sources, I think this would introduce a whole mess of issues and debates that in the end, most collections will not touch. There's also potential to have seemingly conflicting inferences with basisOfRecord Also how does that comport with existing samplingProtocol in field work? collectingsource=contraption if using a mistnet or pitfall trap?

I'm more in favor of better definitions and use cases for basisOfRecord since this is also the first order of filtering of records for many researchers. This field has potential for expanded types, is well used and appears to be able to distinguish collecting sources especially when relevant to the record (e.g, camera trap vs human). However that said, there's a lot of ambiguity as well in some of the example controlled vocab on TDGE (eg. Event vs Occurrence?) Can we shore up basisOfRecord ? Perhaps I'm missing a chunk of conversation but what is missing that we need more splitting of hairs here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2075#issuecomment-834522253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727NAXGXZME742A5CVLTMQBFNANCNFSM4HMRGKIQ .

Jegelewicz commented 3 years ago

I feel like I need time to re-read this whole thing ad digest - it may be a while....

Jegelewicz commented 2 years ago

closing as treated with cataloged_item_type