ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Code Table Request - New attribute: individual count #4032

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 3 years ago

Goal Accurately describe the number of individuals that participated in an occurrence per dwc:individualCount in order to pass appropriate information to aggregators.

Context https://github.com/ArctosDB/arctos/issues/3908#issuecomment-949698521

Table https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type

Value individual count

Definition The number of individuals represented by this catalog record.

Attribute data type number+units

Attribute value integers

Attribute units individuals

Priority [ Please choose a priority-label to the right. ]

Jegelewicz commented 3 years ago

@acdoll @sharpphyl you may want to weigh in.

dustymc commented 3 years ago

No real objections, but the documentation would need to be clear on what this is (some random thing that's never going to get updated?!) and what it can do (within Arctos: nothing that I can see).

acdoll commented 3 years ago

We are definitely in favor of this.

what it can do

Currently, the number of individual organisms in a lot is captured in 'lot count' - this is not passed on to the aggregators (nor should it be; per documentation lot count can describe the number of vertebrae in a box). E.g., https://arctos.database.museum/guid/DMNS:Inv:10020 has two shells in the lot (i.e. 2 individuals). But the GBIF record only reports 1 individual: image

dustymc commented 3 years ago

I agree it's a useful concept, I just don't think this is a suitable place for it.

and/or

I don't think either one of those scenarios are approachable by themselves, much less in the combinations that would come to exist in an active collection.

I think this would be much better as a catalog record attribute, even if that's not fully capable of dealing with the data in some fringe cases. (And it's pretty easy to avoid those situations if this kind of information is important.)

Jegelewicz commented 3 years ago
record has 17 events for some reason (eg really great georeferencing history)
you find another individual hiding in the back of the drawer, so you need to go update all 17 events

this might get used here - #4033

Given that events end up as occurrences, I think this makes sense here. it is either that or as part of "specimen event". It is NOT in any way related to what parts are currently in or have been in the collection right now.

I think this would be much better as a catalog record attribute, even if that's not fully capable of dealing with the data in some fringe cases. (And it's pretty easy to avoid those situations if this kind of information is important.)

No because some records include actual distinct events that may or may not be about the same number of individuals.

you have 20 lots from the same space/time
you have to manage 20 events because the lots all contain a different number of individuals

If you have 20 lots with a different number of individuals from the same event then they all participated in the event and adding one count of individuals that is the sum of all the lot counts to that event should suffice?

Jegelewicz commented 3 years ago

The thing is - no one HAS to use this and if it isn't there, we can just pass "1" as a default to dwc:individualCount. That seems potentially less worse than what we are doing now?

dustymc commented 3 years ago

some records include actual distinct events that may or may not be about the same number of individuals.

I'm not sure that anyone who's dumping stuff into a lot over time is going to much care about this....

pass "1" as a default...less worse

"We don't have that information" is kinda always a defensible position. "... and so we've made something up!", not so much.

Jegelewicz commented 3 years ago

But we are making stuff up now!

sharpphyl commented 3 years ago

Andy has already described the problem for our collection. We do not have multiple collecting events in one record, so I can't speak to that. The difference between one individual and two individuals that Andy pointed out could be meaningful to a researcher. A stronger case can be made for micromollusks which can occur in large numbers which could be important to assess the health of the population, etc. DMNS:Inv:29549 of Caecum bipartitum has 276 shells (in a tiny gel cap).

Screen Shot 2021-10-24 at 2 06 47 PM

GBIF shows one individual.

Screen Shot 2021-10-24 at 2 03 02 PM

As long as the data flows to GBIF as "Individual Count" it doesn't matter to me where I put the number of specimens in the Arctos catalog record.

We support Teresa's recommendation to include DWC Individual Count so that the aggregator records reflect the number of individuals found at that collecting event.

sharpphyl commented 2 years ago

@Jegelewicz Just checking when the Code Table Management team will meet to discuss this. I don't want this issue to drift into oblivion.

dustymc commented 2 years ago

If you have 20 lots with a different number of individuals from the same event then they all participated in the event and adding one count of individuals that is the sum of all the lot counts to that event should suffice?

That is not reflective of how the data are structured.

campmlc commented 2 years ago

@dustymc is there a solution you can suggest? We do need this resolved.

dustymc commented 2 years ago

https://github.com/ArctosDB/arctos/issues/4032#issuecomment-949750557

catalog record attribute

mkoo commented 2 years ago

conceptually count doesnt belong in the collecting event-- I agree with Dusty that it is a attribute of the cataloged record.

If it's not getting passed on in the DwCA, then that's a mapping issue, not a CT or new thing for collecting event (which is location+date:time)

Jegelewicz commented 2 years ago

If it's not getting passed on in the DwCA, then that's a mapping issue

We don't actually record this in a meaningful way anywhere, "part lot count" is not a usable value since we may have 3 parts from a single individual in a given catalog record.

conceptually count doesn't belong in the collecting event

Probably not - since multiple taxa can share a collection event, but it also does not belong as part of the catalog record either. The individual count expected at the aggregators is "The number of individuals present at the time of the Occurrence." What we are passing as "occurrences" are actually "specimen" events (please see https://github.com/ArctosDB/arctos/issues/4036 because our terminology is all over the place and is also problematic). As discussed recently, using "specimen" events as an occurrence is problematic because we end up reporting two occurrences when there is only one. Here is an example:

https://arctos.database.museum/guid/DMNS:Mamm:12344 is from the same individual/collection event as https://arctos.database.museum/guid/MSB:Mamm:233616

BUT they are passed to the aggregators as separate events/individuals

https://www.gbif.org/occurrence/1145096812 and https://www.gbif.org/occurrence/1145267756

Careful consideration of associated occurrences and organism ID will suss this out, but it is a shame that we pass different organism IDs for each of these records. Even if we cleaned up our act and got them into the same collecting event, we would still be sending conflicting information.

Anyhoo. It is probably true that we have no good way to say how many individuals of a particular taxon took part in any given OCCURRENCE (collecting or observation event). Ideas are welcome because sending 1 when there are 276 is a bit misleading.

dustymc commented 2 years ago

We don't actually record this in a meaningful way anywhere,

Correct - I magic it (poorly, probably) for some special circumstances, and there's some legacy not-quite-data from previous attempts of that hanging around. If we want to pass something meaningful on then we need to record it. (And I can magic - probably still poorly! - the initial values if needed.)

What we are passing as "occurrences" are actually "specimen" events

No, we are splitting catalog records at collecting events in an attempt to magick Occurrences out of the aether. What we are passing as Occurrences does not exist in Arctos; that's just not what gets cataloged.

Jegelewicz commented 2 years ago

What we are passing as Occurrences does not exist in Arctos; that's just not what gets cataloged.

Mostly - but I think some records with observation type events are pretty close.

we are splitting catalog records at collecting events

I think we are splitting them at "specimen" events - thus the seid?

Honestly the quoted statement is true for all physical collections in the data aggregators, but after looking at this, I do think there are some things we could be doing better.

So I guess I can go along with making this a collection object attribute even though it isn't really going to solve the whole problem. See updated request.

dustymc commented 2 years ago

some records ... are pretty close.

Most are.

them at "specimen" events

Same thing from the perspective of a single catalog record.

think there are some things we could be doing better

Always.

isn't really going to solve the whole problem

Nope, there are some ragged edges, but I think it does what the collections who seem to care about this want done without adding too much complexity or being too hard to understand in a decade or so.

The number of individuals represented by this catalog record.

That doesn't seem quite right, or complete, or something, but I'm struggling to come up with anything better. @sharpphyl help??

Nicole-Ridgwell-NMMNHS commented 2 years ago

I fully support adding this as a collection object attribute. Is there some way we can represent count = unknown in a way that GBIF would ingest correctly?

Jegelewicz commented 2 years ago

@dustymc how does "INDIVIDUALCOUNT" get calculated?

INDIVIDUALCOUNT   individualCount,
dustymc commented 2 years ago

https://github.com/ArctosDB/arctos/issues/3901

Jegelewicz commented 2 years ago

OK - so we basically pass 1 for everything except Fish collections?

dustymc commented 2 years ago

No, I'm not sure where you're seeing that? (And this issue exists because whatever we're doing doesn't work so I'm not sure why it matters?)

campmlc commented 2 years ago

MSB Para has lot count, and I use this field; default value of unknown = 1, but that's not great. I would hope we are passing values to GBIF if they differ from 1.

On Tue, Nov 9, 2021 at 9:28 AM Teresa Mayfield-Meyer < @.***> wrote:

  • [EXTERNAL]*

OK - so we basically pass 1 for everything except Fish collections?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/4032#issuecomment-964317432, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBD7EPKLQEGZ3PRZ4DLULFD2ZANCNFSM5GQXX5GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Jegelewicz commented 2 years ago

I'm not sure where you're seeing that?

So All I know is this:

for fish, INDIVIDUALCOUNT is sum of select part's lot count
    update flat set (INDIVIDUALCOUNT)=(
    select 
        sum(lot_count) 
    from 
        specimen_part 
        inner join coll_object on specimen_part.collection_object_id=coll_object.collection_object_id
    where 
        part_name like '%whole%' and
        coll_obj_disposition  not in ('discarded','used up','deaccessioned','missing','transfer of custody') and
     specimen_part.derived_from_cat_item=cid
    ) where collection_cde='Fish' and collection_object_id=cid;

What do we do for everything else?

sharpphyl commented 2 years ago

I'm struggling to come up with anything better. @sharpphyl help??

I know our records and individual counts are quite simple in comparison to other collections. For us the number of individuals in the catalog record and the number of individuals in the collecting event are the same. Some records do have multiple parts (shell and opercula) and each part is entered in Parts and the count in Qty. For our purposes, the location of a new attribute for the number of individuals represented in the record could be part of the occurrence or the catalog record as long as it maps to dwc:individualCount in GBIF and other aggregators.

Am I answering the right question, @dustymc ?

dustymc commented 2 years ago

I'm looking for an Arctos definition - "The number of individuals represented by this catalog record." is the current winner yet seems somehow lacking.

For us the number of individuals in the catalog record and the number of individuals in the collecting event are the same.

Nope....

part of the occurrence

For the sake of clarity: There is no such thing in Arctos.

Jegelewicz commented 2 years ago

I think that the number of individuals represented by this catalog record is the only defensible definition if this is a catalog record attribute.

dustymc commented 2 years ago

only defensible definition

But that's not at all in line with the DWC definition ("there were 10,000 individuals present at the time of the Occurrence, we caught three"), nor what the only user of this in Arctos has done ("...and two of those have since been used up, so 1").

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type#abundance seems to fit the DWC idea, but DWC sees to want an incompatible datatype.

Jegelewicz commented 2 years ago

But that's not at all in line with the DWC definition ("there were 10,000 individuals present at the time of the Occurrence, we caught three"),

No argument from me there - but we don't really send occurrences do we?

nor what the only user of this in Arctos has done ("...and two of those have since been used up, so 1").

That is part of what this term is meant to correct - so even if a part is removed or "used up" the original number from our interpretation of "occurrence" is still there.

We don't have to send anything if we don't know or aren't sure.

Nicole-Ridgwell-NMMNHS commented 2 years ago

I like the suggestion in the tdwg thread to use organism quantity and quantity type. https://dwc.tdwg.org/list/#dwc_organismQuantity

Jegelewicz commented 2 years ago

@Nicole-Ridgwell-NMMNHS I have been searching all over for that comment! How would you suggest we implement the organism/type thing?

Jegelewicz commented 2 years ago

Also leaving the link to the comment here - https://github.com/tdwg/dwc/issues/285#issuecomment-965733231

Jegelewicz commented 2 years ago

Also, for some reason the links to the main DwC site never seem to work - so here for reference

https://dwc.tdwg.org/terms/#dwc:organismQuantity

A number or enumeration value for the quantity of organisms.

https://dwc.tdwg.org/terms/#dwc:organismQuantityType

The type of quantification system used for the quantity of organisms.

Examples: 27 (organismQuantity) with individuals (organismQuantityType). 12.5 (organismQuantity) with %biomass (organismQuantityType). r (organismQuantity) with BraunBlanquetScale (organismQuantityType).

Nicole-Ridgwell-NMMNHS commented 2 years ago

If organism quantity is added as a specimen attribute, could we map quantity type to attribute units and create a new units table?

Jegelewicz commented 2 years ago

And attribute remark could help differentiate when the identification on a record is A and B.

Jegelewicz commented 2 years ago

I was looking at changing this issue or creating a new one, but I feel like what we have here is fine, except instead of passing the value in this term to dwc:individualCount, we would pass it to dwc:organismQuantity and we would pass the units value to dwc:organismQuantityType

Does this sound like a good solution?

Nicole-Ridgwell-NMMNHS commented 2 years ago

If we want to call it individual count, would we default the quantity type/units value to individuals?

Organism quantity seems like a more flexible term - do we need that flexibility or are things like %biomass covered by other attributes?

dustymc commented 2 years ago

new units table

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcount_units

default

Nothing currently has a default, but "just UI" (ish, probably).

other attributes

https://github.com/ArctosDB/arctos/issues/4032#issuecomment-965381448 - if those aren't the same, the definitions would need to reflect the differences.

sharpphyl commented 2 years ago

instead of passing the value in this term to dwc:individualCount, we would pass it to dwc:organismQuantity and we would pass the units value to dwc:organismQuantityType

I'm not sure what the difference is between individual count and organism quantity, but if that solves the problem for our invert collection and makes sense for everyone else, I'm fine with it. Thanks!

sharpphyl commented 2 years ago

I think that the number of individuals represented by this catalog record is the only defensible definition if this is a catalog record attribute.

Yes, we have no idea how many other individuals were present at the collecting event. We only know how many (shells) were collected for our catalog record.

instead of passing the value in this term to dwc:individualCount, we would pass it to dwc:organismQuantity and we would pass the units value to dwc:organismQuantityType

Is this something that can be done for an individual (probably invertebrate) collection or are we waiting for AWG approval or an update or something else? I'd like to take action so that the number of organisms represented in our catalog records (and recorded under Qty) shows up in GBIF instead of always being 1.

What do we need to do to keep this issue moving forward?

dustymc commented 2 years ago

keep this issue moving forward

Just a focused discussion of how to store the data. Here's a shot, mostly copy-pasta of the edited original, please edit/replace/whatever as necessary.

Goal Record number of individuals cataloged.

Context

3908 (comment)

Table https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type

Value individual count

Definition The number of individuals represented by the catalog record.

Attribute data type number+units

Attribute value integers

Attribute units ctcount_units

(Then, we can talk about DWC Mapping, but we should also make sure the data we set up are compatible with the target so a little chicken-n-eggy, I'm still thinking https://dwc.tdwg.org/terms/#dwc:individualCount.)

sharpphyl commented 2 years ago

@dustymc So you would add ctcount_units as an attribute in the code table code table attribute_type for Inv and other collections as needed. For example, we would enter 5 as the Attribute value and select "individuals" as the Attribute units.

Would you be able to magic the number currently in the Qty field into this new attribute? It sounds like we would still have the Qty field for the parts (shell, operculum, etc.). We do have some records with a different quantity of two parts which we could enter manually.

As for the DWC mapping, my only concern is that it map to GBIF and iDigBio "Individual Count" which is the field that currently always shows 1.

Sounds good for our collection. @acdoll Any concerns?

sharpphyl commented 2 years ago

@Jegelewicz Will the Code Table Committee be able to take up this topic at their next meeting?

Table https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type

Value individual count

Definition The number of individuals represented by the catalog record.

Attribute data type number+units

Attribute value integers

Attribute units ctcount_units

Jegelewicz commented 2 years ago

Definition The number of individual organisms represented by the catalog record.

Jegelewicz commented 2 years ago

Above from AWG Issues meeting. Are we ready to add this with the definition above or is there still "lot" confusion? @acdoll @mkoo @sharpphyl @ccicero @campmlc

acdoll commented 2 years ago

Sounds good to me.

Jegelewicz commented 2 years ago

Code Table Committee says add!

Jegelewicz commented 2 years ago

Need to map to individualCount for aggregators.

Jegelewicz commented 2 years ago

Added to code table.