ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

Parts for observation events #2118

Closed Jegelewicz closed 3 years ago

Jegelewicz commented 5 years ago

A UMNH student is working on an observation collection and is working on writing some documentation. She sent me the following:

As far as documentation goes, I am doing some research on what others have done for entering observations, which all seem to be a bit different. Here’s what I’ve found:

UAMObs:Mamm
    **No media** but has a location, collector, original identifier, link to GBIF occurrence,**part:observation**, remark, and accession
MVZObs:Mamm
    **Has media**, location, collector, link to GBIF, attributes, remarks, accession, and usage, but **no parts**
MSBObs:Mamm
    **No media** but has location, collector, identifiers, link to GBIF, **part:observation**, attributes, accession (created an observation accession name: “2009.001.MammObs”), usage
UTEPObs:Herp
    **Has media**, citations, location, maker (instead of collector), identifiers, relationship, **part:media**, attributes, remarks, and accession

Bold was added by me for emphasis in this discussion. As this demonstrates, we are handling observation parts in many different ways and I would like to start a conversation about what is "best practice".

I suggest the following:

  1. When an observation exists with no media, there should be no part.
  2. When an observation exists with media, the part should be "media", even though the media may be attached to the record.
  3. We should get rid of the part name "observation". BUT then we need a way to remove these records from the "part-less" specimen trigger.

I don't know why/how the part name "observation" came to be, but it doesn't make much sense to me. If someone really needs it, then I could be convinced to keep it, but if that is the case, I think we should have NO part-less records unless they are in process and we should recommend that all observations have the part "observation".

campmlc commented 5 years ago

We have another collection of largely observations, the MSB Host catalog. We cannot remove observations as parts because they are used for this collection. These are observations of hosts that may or may not have vouchers across many institutions, recorded from field data catalogs. This collection provides taxonomically searchable host data records that otherwise would be lost.

On Mon, Jun 10, 2019, 1:07 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

A UMNH student is working on an observation collection and is working on writing some documentation. She sent me the following:

As far as documentation goes, I am doing some research on what others have done for entering observations, which all seem to be a bit different. Here’s what I’ve found:

UAMObs:Mamm

**No media** but has a location, collector, original identifier, link to GBIF occurrence,**part:observation**, remark, and accession

MVZObs:Mamm

**Has media**, location, collector, link to GBIF, attributes, remarks, accession, and usage, but **no parts**

MSBObs:Mamm

**No media** but has location, collector, identifiers, link to GBIF, **part:observation**, attributes, accession (created an observation accession name: “2009.001.MammObs”), usage

UTEPObs:Herp

**Has media**, citations, location, maker (instead of collector), identifiers, relationship, **part:media**, attributes, remarks, and accession

Bold was added by me for emphasis in this discussion. As this demonstrates, we are handling observation parts in many different ways and I would like to start a conversation about what is "best practice".

I suggest the following:

  1. When an observation exists with no media, there should be no part.
  2. When an observation exists with media, the part should be "media", even though the media may be attached to the record.
  3. We should get rid of the part name "observation". BUT then we need a way to remove these records from the "part-less" specimen trigger.

I don't know why/how the part name "observation" came to be, but it doesn't make much sense to me. If someone really needs it, then I could be convinced to keep it, but if that is the case, I think we should have NO part-less records unless they are in process and we should recommend that all observations have the part "observation".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2118?email_source=notifications&email_token=ADQ7JBCMZB3VC7IIH46KD6DPZ2Q7LA5CNFSM4HWXA5WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYUNSVQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBCFDFVNQ7UERKH2Y3DPZ2Q7LANCNFSM4HWXA5WA .

Jegelewicz commented 5 years ago

You can have "part-less" records that can still be linked to the parasite vouchers. You don't need to have a part.

campmlc commented 5 years ago

The observation part dates to before we had observation as an event type, if I remember correctly - or we had both, and it wasn't clearly indicated when one vs the other should be used. It has been very helpful to be able to use part type to clearly separate those host records that are based on observations from those that have located an actual voucher, and to have both the observational record and the actual record combined. We can now do that with event type, but the events for the observation are frequently divergent from the event data for the specimen, since the specimen was frequently cataloged from a specimen tag with only partial or divergent data compared to the original collection ledger. To convert we would need to create separate specimen events that would then be linked to either the observation part or the actual specimen part . . . something we can do, because of the Mexican wolf model, but which must be done manually, one part at a time, and is very difficult to implement in practice. I'm happy to consider alternatives if they can be automated, but this is what currently exists, and I know that there is not any designated labor or funding to go one by one and make these changes retroactively in the existing host catalog at this point.

On Mon, Jun 10, 2019 at 3:51 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:

You can have "part-less" records that can still be linked to the parasite vouchers. You don't need to have a part.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2118?email_source=notifications&email_token=ADQ7JBDV2KWUM7Q4EWSRHD3PZ3EFXA5CNFSM4HWXA5WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLK46Q#issuecomment-500608634, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBBRVVVMVSZSHD77J6LPZ3EFXANCNFSM4HWXA5WA .

dustymc commented 5 years ago

There are some vocabulary issues which should be ironed out.

There's definitely some unnecessary vocabulary hanging around here and there.

When an observation exists with no media, there should be no part.

Change that to "physical material" and I'll agree.

When an observation exists with media, the part should be "media", even though the media may be attached to the record.

No, parts are physical - although I think it gets used this way and the Arctos Police certainly aren't going to come around inspecting your alleged parts, so whatever - if you're happy then I'm happy!

We should get rid of the part name "observation".

agreed

BUT then we need a way to remove these records from the "part-less" specimen trigger.

That thing is just a report - it's easy to ignore or adjust....

observation part dates to before we had observation as an event type,

Yea, that sounds right to me. I think we used to "require" a part in data entry as well.

clearly separate those host records that are based on observations from those that have located an actual voucher

I think this is making a distinction that doesn't exist (and AFAIK it's common in all collections). Some collector said it's a squirrel and tossed the evidence, or the same person said it's a squirrel but we've lost the evidence. Those look functionally identical to me - you can either trust the collector or not. If you know of some sort of evidence then you can add it, if not then no parts seems most correct to me.

Maybe we need some sort of "evidence" categorical search option - find stuff that's supported by findable bio-bits, Media, field notes, lab notes, etc. New Issue if so....

have both the observational record and the actual record combined.

https://github.com/ArctosDB/arctos/issues/1966 is, among other things, a way to bring "specimens" together as "individuals."

make these changes retroactively

I think that's a two-part problem:

1) what should we be doing? 2) how do we do whatever that is to any existing 'legacy' data?

Jegelewicz commented 5 years ago

When an observation exists with no media, there should be no part.

Change that to "physical material" and I'll agree.

Yeah, that's what I meant, in a way - because really, just because you haven't printed the photo, does that mean it isn't a physical object?

We should get rid of the part name "observation".

agreed

Yay!

BUT then we need a way to remove these records from the "part-less" specimen trigger.

That thing is just a report - it's easy to ignore or adjust....

Yay!

observation part dates to before we had observation as an event type,

Yea, that sounds right to me. I think we used to "require" a part in data entry as well.

So NOW we don't need "observation" as a part. Again, Yay!

clearly separate those host records that are based on observations from those that have located an actual voucher

I think this is making a distinction that doesn't exist (and AFAIK it's common in all collections). Some collector said it's a squirrel and tossed the evidence, or the same person said it's a squirrel but we've lost the evidence. Those look functionally identical to me - you can either trust the collector or not. If you know of some sort of evidence then you can add it, if not then no parts seems most correct to me.

Our current structure already does this. If you have a host that is only supported with an observation then there won't be a part (the host was never preserved), or the part will be missing (oops we have lost it), even if the specimen event is "collection".

Maybe we need some sort of "evidence" categorical search option - find stuff that's supported by findable bio-bits, Media, field notes, lab notes, etc. New Issue if so....

This is where "media" as a part makes sense even if it is only bit and bytes (I don't think it matters if it is machine readable or not). But I can see how separating physical media parts from digital media parts might make sense?

We could split "media" into:

physical media: printed photograph, printed or handwritten notes or catalogs digital media: digital recordings including photographs, video, and sound or digital scans of physical notes or catalogs

have both the observational record and the actual record combined.

1966 is, among other things, a way to bring "specimens" together as "individuals."

I won't go into my issues with #1966 and how I don't think it is sustainable over the long term, but my solution would be two records: one for the observation (specimen event = observation; with NO part), with same collecting event as the voucher (specimen event type = collection; with part = whole organism or whatever it actually is), linked by the "same individual as" relationship.

what should we be doing?

Observations cataloged as event type = observation and only be allowed the part "media", have no part at all, (OR have the part "whole organism" with the disposition "missing"?).

Collected specimens should be cataloged as event type = collection and should have at least one physical part (not media), even if it has the disposition of "missing".

how do we do whatever that is to any existing 'legacy' data?

I suggest we remove the "observation" parts and make sure events associated with those are labeled as type = observation. Observation parts COULD be replaced with "whole organism"/missing if that's the way that seems best, otherwise, the "part-less specimen" report would need some tweaking.

dustymc commented 5 years ago

physical media: printed photograph, printed or handwritten notes or catalogs digital media: digital recordings including photographs, video, and sound or digital scans of physical notes or catalogs

That is not the distinction the current data make. Part "media" exists so that you can track physical stuff - a tape, thumb drive, paper, WHATEVER. If it's "in the cloud" then you're entrusting everything to whomever you've entrusted with your data, there's nothing to track, and there's no real reason to complicate your life by attempting to bring virtual things into the part of Arctos that exists only to deal with physical things. If you've kept a local copy on some sort of "media" then the part can help you find it.

two records: one for the observation (specimen event = observation; with NO part), with same collecting event as the voucher (specimen event type = collection; with part = whole organism or whatever it actually is), linked by the "same individual as" relationship.

The origins of duplicating data in MSB:Host (and UAMObs:Ento and probably others) are entirely social - Arctos users want to do stuff that can't be done in other systems. If there's a "host" record cataloged in Arctos, there's very little reason to create an observation record. (The Curator owning the host having cruddy data and being unwilling to play nice would do it, but I don't think that's ever happened and it doesn't seem likely to.) The goal should always be to eliminate one of the records.

part "whole organism"

That is but one use case - tracking device pings, hair, tracks, scat, etc., etc., etc. can be cataloged as "observations."

with the disposition "missing"?

That implies it might be found.

Here's part=observation data.

select 
  guid_prefix,
  SPECIMEN_EVENT_TYPE,
  count(*)
from 
  collection,
  cataloged_item,
  specimen_part,
  specimen_event
where
  collection.collection_id=cataloged_item.collection_id and 
  cataloged_item.collection_object_id=specimen_part.derived_from_cat_item and
  cataloged_item.collection_object_id=specimen_event.collection_object_id and
  part_name='observation'
group by guid_prefix,SPECIMEN_EVENT_TYPE
order by guid_prefix,SPECIMEN_EVENT_TYPE
 ;

GUID_PREFIX
------------------------------------------------------------
SPECIMEN_EVENT_TYPE
------------------------------------------------------------------------------------------------------------------------
  COUNT(*)
----------
APSU:Herp
collection
    75

MSB:Herp
collection
     3

MSB:Host
collection
     21446

MSB:Mamm
collection
     1

MSB:Para
collection
     13512

MSBObs:Mamm
collection
     3

MVZ:Herp
collection
     2

MVZObs:Mamm
collection
    28

UAM:Ento
collection
     1

UAMObs:Bird
collection
       162

UAMObs:Ento
collection
       220

UAMObs:Mamm
collection
       195

UAMObs:Mamm
observation
     4

UCM:Obs
observation
       251

UTEPObs:Herp
observation
     1

15 rows selected.
Jegelewicz commented 5 years ago

If it's "in the cloud" then you're entrusting everything to whomever you've entrusted with your data, there's nothing to track, and there's no real reason to complicate your life by attempting to bring virtual things into the part of Arctos that exists only to deal with physical things. If you've kept a local copy on some sort of "media" then the part can help you find it.

Why does it matter WHERE it is? I think that stuff "in the cloud" might be more secure than stuff on a hard drive in my office that someone knocks over, destroying all the data (personal experience). Tracking the location of digital media "in the cloud" is no different than tracking it on a hard drive.

two records: one for the observation (specimen event = observation; with NO part), with same collecting event as the voucher (specimen event type = collection; with part = whole organism or whatever it actually is), linked by the "same individual as" relationship.

The origins of duplicating data in MSB:Host (and UAMObs:Ento and probably others) are entirely social - Arctos users want to do stuff that can't be done in other systems. If there's a "host" record cataloged in Arctos, there's very little reason to create an observation record. (The Curator owning the host having cruddy data and being unwilling to play nice would do it, but I don't think that's ever happened and it doesn't seem likely to.) The goal should always be to eliminate one of the records.

In the MSB Host scenario, there isn't a record somewhere that we are simply unable to connect with or a lazy curator unwilling to catalog stuff. The collector has passed and only left notes about host organisms often with no information about what happened to them. Creation of the host observation record allows us to answer questions about hosts and parasites that may not be possible with simply a "host" or "parasite" attribute because of the inability to make connections via higher taxa in taxonomy.

part "whole organism"

That is but one use case - tracking device pings, hair, tracks, scat, etc., etc., etc. can be cataloged as "observations."

In those cases, part might be "hair", "feces", "media", etc. And to me the hair and scat sound like "encounter" events: Specimen was encountered and not killed or removed from context; Biological Samples were taken. Biopsies belong here. The device ping and tracks I think would be observations: Specimen was detected and not killed or removed from context; No biological samples were taken. Human sightings, camera traps, and GPS telemetry data are appropriate here.

with the disposition "missing"?

That implies it might be found.

If that possibility exists, then missing is appropriate (maybe those hosts are in a freezer somewhere). If we know it was thrown out, then "discarded" would be appropriate: Object has been discarded. Do not use this to mean "used up."

dustymc commented 5 years ago

Why does it matter WHERE it is?

You can put your hands on something==>parts. You can't==>not parts. That's it. We're obviously not quite there, but that's the structure and we should aim for it when we can.

Tracking the location of digital media "in the cloud" is no different than tracking it on a hard drive.

You can throw a barcode on the thumb drive before you toss it in the barcoded drawer in the barcoded junk-bin, then find it later. As far as parts go, that's the only difference - you have something physical that you want to track via Arctos (so need parts) or you don't (so don't). You could barcode the stickynote with your AWS password and call it part=media too - I don't care what sort of "media" you have, this is just a way to track it if you do have something physical that might be classified as "media."

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_PART_NAME&field=media

In the MSB Host scenario...

Yes, all agreed and understood. I maintain that the goal should always be to eliminate one of the records, but I understand that reality complicates that.

hair and scat sound like "encounter" events:

Not if you only photographed them.

To get back to the original:

UAMObs:Mamm

was started for stuff I could see/identify, but not land and collect - it was (maybe is) "pure observations."

has a location, collector, original identifier,

all from my GPS - at least that part should be trustworthy, even if that's not expressed very well in the data. If you trust me to identify a walrus, those data might be useful for some things. (Here's another place normalized Agent data is powerful - it's real easy to find the walrus with vouchers I collected/photographed/IDed, you could check them and perhaps get some idea of whether I know what a walrus looks like or not.)

link to GBIF occurrence,

That's automagic - it just happens for anything that makes it to GBIF

part:observation, likely a leftover from when it was "required"

MVZObs:Mamm ... no parts

http://arctos.database.museum/guid/MVZObs:Mamm:10 has parts - other collections might just catalog that ear snip, someone at MVZ chose to catalog it in a separate collection. It's important to be consistent in HOW this stuff is cataloged and not try to infer anything from WHERE it's cataloged.

MSBObs:Mamm ...observation accession

Accessions (everything has them) are entirely arbitrary/administrative, and how they're used varies wildly across collections.

UTEPObs:Herp maker (instead of collector)

Many (most?) collections include some outliers - there's no particular reason not to catalog your old snake-skin boots in a herp collection instead of giving them to the ethnology department.

campmlc commented 5 years ago

In MSB Host, we have observations of parts that include lot count and preservation. We don't know where these are, but they are not "missing", just have not been located yet by our collections. The only source we currently have of this part info is the scanned collection ledger. If I can't have an observation part, then I need a media part.

On Mon, Jun 17, 2019, 8:05 PM dustymc notifications@github.com wrote:

Why does it matter WHERE it is?

You can put your hands on something==>parts. You can't==>not parts. That's it. We're obviously not quite there, but that's the structure and we should aim for it when we can.

Tracking the location of digital media "in the cloud" is no different than tracking it on a hard drive.

You can throw a barcode on the thumb drive before you toss it in the barcoded drawer in the barcoded junk-bin, then find it later. As far as parts go, that's the only difference - you have something physical that you want to track via Arctos (so need parts) or you don't (so don't). You could barcode the stickynote with your AWS password and call it part=media too - I don't care what sort of "media" you have, this is just a way to track it if you do have something physical that might be classified as "media."

http://arctos.database.museum/info/ctDocumentation.cfm?table=CTSPECIMEN_PART_NAME&field=media

In the MSB Host scenario...

Yes, all agreed and understood. I maintain that the goal should always be to eliminate one of the records, but I understand that reality complicates that.

hair and scat sound like "encounter" events:

Not if you only photographed them.

To get back to the original:

UAMObs:Mamm

was started for stuff I could see/identify, but not land and collect - it was (maybe is) "pure observations."

has a location, collector, original identifier,

all from my GPS - at least that part should be trustworthy, even if that's not expressed very well in the data. If you trust me to identify a walrus, those data might be useful for some things. (Here's another place normalized Agent data is powerful - it's real easy to find the walrus with vouchers I collected/photographed/IDed, you could check them and perhaps get some idea of whether I know what a walrus looks like or not.)

link to GBIF occurrence,

That's automagic - it just happens for anything that makes it to GBIF

part:observation, likely a leftover from when it was "required"

MVZObs:Mamm ... no parts

http://arctos.database.museum/guid/MVZObs:Mamm:10 has parts - other collections might just catalog that ear snip, someone at MVZ chose to catalog it in a separate collection. It's important to be consistent in HOW this stuff is cataloged and not try to infer anything from WHERE it's cataloged.

MSBObs:Mamm ...observation accession

Accessions (everything has them) are entirely arbitrary/administrative, and how they're used varies wildly across collections.

UTEPObs:Herp maker (instead of collector)

Many (most?) collections include some outliers - there's no particular reason not to catalog your old snake-skin boots in a herp collection instead of giving them to the ethnology department.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2118?email_source=notifications&email_token=ADQ7JBBT3FEHPPEUDXLMFPTP3A7ETA5CNFSM4HWXA5WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX46JFA#issuecomment-502916244, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ7JBGHENEHSG4NEWCSZW3P3A7ETANCNFSM4HWXA5WA .

Jegelewicz commented 5 years ago

We don't know where these are, but they are not "missing", just have not been located yet by our collections

Why the opposition to saying they are missing? Not knowing where they are by definition means they are missing and can therefor not be loaned to anyone.

adjective: missing (of a thing) not able to be found because it is not in its expected place.

Jegelewicz commented 5 years ago

You can throw a barcode on the thumb drive before you toss it in the barcoded drawer in the barcoded junk-bin, then find it later. As far as parts go, that's the only difference - you have something physical that you want to track via Arctos (so need parts) or you don't (so don't).

Well given the part "observation" this is not true. It seems that parts are being used for things other than objects to track.

The only source we currently have of this part info is the scanned collection ledger. If I can't have an observation part, then I need a media part.

However, since Mariel CAN (if she chooses to) put a barcode on the physical ledger, I would say she has a legit "media" part.

BUT I would still argue that the part should be "whole organism" with lot count whatever and preservation whatever and a disposition of "missing" = not able to be found because it is not in its expected place.

dustymc commented 5 years ago

parts are being used for things other than objects to track.

Yes, that's my "We're obviously not quite there, but that's the structure and we should aim for it when we can." I think whatever caused the 'observation' part to exist is not longer a barrier.

put a barcode on the physical ledger, I would say she has a legit "media" part.

Absolutely.

part should be "whole organism"

That's still weird to me, but I don't know why. It does seem reasonable to somehow say "someone saw a caribou" vs. "someone saw a caribou track."

missing

That still makes me twitchy. For Rausch stuff where he says there's something and you can't find it (even because some other museum has funky data), maybe. For "saw a whole caribou but we don't know what we did with it," it's weird. Maybe useful, but still weird....

Jegelewicz commented 5 years ago

part should be "whole organism"

That's still weird to me, but I don't know why. It does seem reasonable to somehow say "someone saw a caribou" vs. "someone saw a caribou track."

It actually might end up being "skin" or some other usual part, but in the case we are talking about (Rausch), we have parasites FROM the caribou, so someone did more than see it. It could always be part "unknown". If it truly is "I saw a Caribou" that should be a part-less observation event.

missing

That still makes me twitchy. For Rausch stuff where he says there's something and you can't find it (even because some other museum has funky data), maybe. For "saw a whole caribou but we don't know what we did with it," it's weird. Maybe useful, but still weird....

Have you met an actual curator? Losing a caribou doesn't seem that crazy..... :-)

Jegelewicz commented 4 years ago

@ccicero this relates to our code table discussions last week.

Jegelewicz commented 3 years ago

Closing as duplicate/part of #3885