SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
Other
85 stars 25 forks source link

Improve semantics on CollectingEvent identifiers with respect to DwC import #3800

Open mjy opened 7 months ago

mjy commented 7 months ago

Thanks to @debpaul's testing we have some things that we should work to improve on. The basic issues are included in the image below.

@LordFlashmeow, @LocoDelAssembly please chime in after digesting this. In particular we need to re-vist how AntWeb is getting data in-out, as they have most extensively imported identifiers.

I do anticipate a couple things:

In general we also need to interpret eventID. To me it should reflect the digital record, i.e. eventID is roughly 1 to 1 with a single TW CE

20240126_114521

Thoughts?

LocoDelAssembly commented 7 months ago

Please clarify if fieldIdentifier is actually fieldNumber? ( https://dwc.tdwg.org/list/#dwc_fieldNumber, currently mapped as verbatim_trip_identifier )

eventID is mapped as TripCode identifier and the namespace is either assigned from non-standard TW:Namespace:eventID field, or from the auto-generated per import dataset one (there was a bug with this one https://github.com/SpeciesFileGroup/taxonworks/commit/73b187deb60c1c4568904da90bfe7e40fe4a975b)

At present it is eventID that serves as a way to detect that two events are the same, scoped either to a specific TW:Namespace:eventID, or a import dataset-specific namespace. When a match is found, existing CE is used rather than creating a new one and merge data.

Proposal is new namespace, probably global (although would break stuff), to identify CEs?

mjy commented 7 months ago

Sorry for label confusion, yes fieldNumber

mjy commented 7 months ago

Making a note that TW:Namespace:eventID matches against Namespace#short_name

mjy commented 7 months ago

Duplicate of https://github.com/SpeciesFileGroup/taxonworks/issues/2852.

mjy commented 3 months ago

Add constraints per type:

Ensure we handle DwC import for

Ensure DwcOccurrenceHooks are fired

Ensure DwcOccurrence Index is written

mjy commented 2 weeks ago

Behaviours

CollectingEvents

CollectionObjects