AAFC-BICoE / dina-planning

AAFC-DINA planning repository

3 stars 2 forks source link

Preparation Process I: Create a list of fields to define the direction and relationship between a Collecting Event and a Catalogued Object #166

Open dshorthouse opened 3 years ago

dshorthouse commented 3 years ago

GIVEN I have accessed DINA as a user

WHEN I attach a Catalogued Object to a Collecting Event

THEN I need a way to (optionally) describe the preparation process that gave rise to the Catalogued Object from a parent sample, which itself may be (temporarily) unknown or unknowable.

Aside: We may (as a shortcut) attach a Catalogued Object directly to a Collecting Event, but there will be occasion to detach that Object from this direct relationship and recreate it through a newly created parent Physical Entity when it is later discovered that many Catalogued Objects were derived from a single Physical Entity that itself is more appropriately & directly linked to a Collecting Event (eg in Botany this may be derived from field notes & the assessment of duplicates that were communicated to multiple institutions, all of which are derived from a singular & shared Physical Entity). The reasons for doing this are to prevent the unnecessary duplication of local Collecting Events, better define the chain of provenance among Catalogued Objects & their internal relationships to a Collecting Event, and to create a future placeholder for more functional connections across institutions where duplicates or derivatives are housed. Nonetheless, we need a model for the contents of a Preparation Process that gives rise to the join between a Collecting Event and a Physical Entity, a Catalogued Object or otherwise. This ticket is to define the fields and the joins to other data objects as components of the Process that is (optionally) embedded in the link between a Collecting Event and a Physical Entity.

link to Agent(s)
expression of action(s) the Agent executed
Timestamp Start / Timestamp End
Protocol employed

heathercole commented 3 years ago

just throwing some general ideas in here to get thoughts started; am making a generalization that a collection manager is NOT the one doing the collecting (that would be a researcher/technician). Since we haven't really nailed down what a 'cataloged object' is, am just including processes that may create entities that need identifiers that need to be tracked.

Researcher/Technician

biological specimen collected in the field (perhaps by hand, or with a trap, or a sample (eg. soil sample)
- in the field, it would be provided with some sort of identifier, which (probably?) needs to be retained in some way for reference
- data related to this 'collection event' must be recorded and associated to the identifier
specimen must be processed/preserved in some way (put in appropriate container/dried/(killed)/pinned). The process may be simple (mount on sheet) or involve a more complex protocol that should be documented
perhaps a sample is taken from the specimen for DNA analysis
perhaps the sample is divided; a research keeps some, and gives the rest to a collection

Collection Manager

specimen is archived/accessioned/cataloged with a Biological Collection
- needs a unique identifier
- may need to be further processed (eg. plant material is received dried, but not mounted on sheet, insect is received in ethanol but needs to be pinned); this may involve creating multiple 'entities' (eg. pine needles are mounted on sheet, and pine cone put in a box (from same collection event/specimen)
- needs to be associated to its collection data and other info (eg. #165 )
- may need to be associated to additional documentation; collecting permits, CITES, etc.
- is imaged/digitized (needs to be associated to relevant media)
- specimen needs to be replicated/rejeuvenated (CPVC, GINCO, CCFC) (internal process)
- specimen may need to be sampled (some material removed), that sample may be destroyed, but also may become its own 'specimen' (depending on process), sample may be sent to a client
- specimen may be annotated and re-imaged
- specimen may be sent on loan
- specimen (replicate) may be given away

rintoult commented 3 years ago

In the SeqDB scheme the collection event is coupled only with the Specimen - there is no oppportunity for a single to many relationship, the data is just replicated across the specimens/catalogued objects with no real way to find all associated with a single collection event. So I look forward to being able to reference collection events in the future.

I think as mentioned above the link to Agents will be another great improvement.

SeqDB field names are laid out in this way "name of field in background/code" = "Displayed Field Name" | Definition of field

Here are the fields currently linked to collecting events - I have asterisked ** the fields that are/could be linked either to specimens, agent or "preparation process".

-- collectionInfo properties -- |

-- | -- collectionInfo.continent = Continent | Continent from which specimen was collected collectionInfo.country = Country | Country from which specimen was collected collectionInfo.province = Province / State | Province or State from which specimen was collected collectionInfo.region = Region | Region from which specimen was collected, i.e.: County, District, body of water collectionInfo.city = City | City from which specimen was collected collectionInfo.site = Exact Site | Description of exact site where specimen was collected collectionInfo.latitude = Latitude | Co-ordinates for collection site collectionInfo.longitude = Longitude | Co-ordinates for collection site collectionInfo.date = Collection Date (yyyy-mm-dd) | Date of collection. collectionInfo.notes = Notes | Notes for the Collection Information collectionInfo.collector = Collectors | Persons involved in collecting the specimen collectionInfo.sector = Sector | collectionInfo.elevation = Elevation | collectionInfo.gpsSource = GPS Source | collectionInfo.week = Project Week | Numbers assigned to weeks during the sampling season collectionInfo.collectorType= Collector Type | Type of collector (JB/YE/BK) collectionInfo.samplerInstallationDate = Sampler Installation Date(yyyy-mm-dd) | Date that weekly sampling apparatus is installed in the collector**** collectionInfo.samplerCollectionDate = Sampler Collection Date(yyyy-mm-dd) | Date that weekly sampling apparatus is removed from the collector collectionInfo.filterSize = Filter Size (um) | Size of pores in filter units used to collect JB samples (either 0.45uM or 8.0 uM) collectionInfo.rainFall = Rain Fall (mm) | Weekly rainfall data collectionInfo.rainVolumeCollected = Rain Volume Collected (mL) | The volume of water collected using the Yankee sampler, or from plugged JB filter units****

rintoult commented 3 years ago

The other thing I don't remember if we have discussed is Permits. We have a spot on the collecting event that lets us attach Collecting Permits and Permissions. This document might also need to be directly connected to each catalogued object generated under the permissions from that permit.

banchinic commented 3 years ago

We use SeqDB but don't have much use for most of the fields that Tara highlighted. Not sure if it's the right issue for this comment but since some fields are similar to some highlighter above here are the missing fields in DINA that I think should be associated to the Collecting Event and not only to the Physical Entities that we need in our collection.

Elevation Habitat Habitat notes field (special observations) Climate Air temperature Soil properties: many fields ! (I have provided this list already but can provide again if necessary) Host (in our case we don’t want to create a separate entity for the host, just want to record what it was) Host Identifier (person who identified the host) Collection Protocol Permit & Permissions (link or PDF)

michellelocke commented 3 years ago

One of the major preparation processes in the CNC are Dissections. They are a part of the same catalogued object but may be stored with the specimen (on a pin, or in a vial) or elsewhere (a slide in a slide rack) and are given a unique Dissection ID

Dissection ID: a unique number consisting of a letter-number combination and would function similar to the Specimen ID. This could be auto-generated as the next available number in a series or any appropriate string could be entered. It should not be mandatory to use a unique ID as dissections can be stored in a vial on the same pin as the specimen. It might be useful to use this to track what the dissection is and who did it though. Other Dissection ID: for other possible numbers/codes related to the dissection Dissection part: list of body parts to select from Mounting medium: list of mounting media to select from Dissection Preparation: list of processes used to prepare specimen (ex: KOH, lactic acid) Dissections Stain: list of stains used, multiple could be selected (ex: Eosin Y, Chlorazol). Maybe this can be combined with Dissection Processes and multiple items in the list could be selected. Dissection Storage: describes where dissection is stored (often a slide in a slide rack) Dissected by: possibly a link to agent Dissection date (YYYY-MM-DD): same format as other date fields Mounted by: possibly a link to agent Mount date (YYYY-MM-DD): same format as other date fields Dissections Notes: text field for notes.

michellelocke commented 3 years ago

We currently use a field called Preparation to describe the final disposition of the specimen (or how the specimen is stored). Examples are: pinned, pinned: point, etoh: 95%, Slide: PVA, photograph. This would be a pick list. We do not capture any data on who prepared it for the collection.

We would need a field that captures data on a middle process to obtain the specimen. For the CNC, Method captures the collection method (sweeping, Malaise trap, pitfall trap, etc.; this is related to the collection event) and Preparation captures it's end point. But a process like using a Berlese funnel to extract soil arthropods from a soil sample doesn't fit either of these. In this case Method might be soil sample, soil core, etc., Preparation might be slide: PVA or etoh: 95%, but we need another field to capture that process that happens back at the lab to get that specimen from the soil. Maybe something like, Specimen Extraction Method (to note different from a DNA extraction method).

michellelocke commented 3 years ago

The other type of Preparation Process used by the CNC are DNA extractions. Most of our group does not use SeqDB but uses BOLD to manage their data, or other software. I would recommend talking to scientists like Sophie Cardinal, Jeff Skevington and Bryan Brunet to find out what they would require in order to use DINA to store sequences or track processes. It my be easiest to have fields that link to BOLD or GenBank records, and not bother to bring the data over. Or have an easy way to upload data from their other platforms to the CNC. I'm really not sure if they would want to use the database to manage the sequencing process (as CCFC and the mycologists do) or to just have a final home for the sequence itself. Please consult CNC scientists for this info as they are the ones generating sequence data from CNC specimens. This data is valuable to CMs but is not generated by us.

shannonasencio commented 3 years ago

David summed up the field requirements and appropriate linkages well. As Michelle said, the various preparation types for final disposition should be available to choose from a picklist. All that being said, we don't typically record who, for example, mounted a specimen on a sheet and when they did that, but I think there is value in having the capacity to do so (esp. for things like slide preparations).

dshorthouse commented 3 years ago

Looks like the spirit of this ticket got sidetracked from the intent. I did not word it well. This was meant to be a ticket to help define the Process(es) by which a Catalogued Object(s) comes to exist from Physical Entity/ies while maintaining their link to Collecting Event. And so, it's really about the who did it, what did they do to it, and when they did it. What's been muddled a bit on our development approach is that a Collecting Event is actually a preparation process in and of itself with the addition of "where did they collect it". It's rather nonsensical to have a Collecting Event – an action, a process – as a stand-alone entry in a database without requiring that it describe the relationship between at least two objects: the thing in my hand & the thing from which it was derived. "The thing from which it was derived" in whole organism samples is more theoretical than tangible (it's the wild population!) but is nonetheless an entity of sorts & you've merely plucked a "sample" from it. It's equally nonsensical for a collections management system to model a wild population but nonetheless, THAT's the progenitor of the whole chain of events.

So...

In addition to:

link to Agent(s)
expression of action(s) the Agent executed
Timestamp Start / Timestamp End
Protocol employed

What I was fishing for were ways we might wish to follow chains of relationships in defined directions & not merely look at the joins as "A is linked to B":

childOf (& the opposite, parentOf)
derivedFrom (eg dissections)
replicateOf
sameAs