AAFC-BICoE / dina-planning

AAFC-DINA planning repository
3 stars 2 forks source link

Material Sample: Create core set of fields for a catalogued specimen #165

Closed dshorthouse closed 3 years ago

dshorthouse commented 3 years ago

GIVEN I have accessed DINA as a user

WHEN I access a physical entity (limited to a catalogued object in this instance)

THEN I expect to have access to a core set of fields that are specific to the object under view

** Note that the intent of this ticket is not to capture linkages to other modules such as Collecting Event, Preservation Processes, or Determinations. Rather, the intent here is to record expected fields that are found nowhere else except as physical entities with particular emphasis on a catalogued object.

Possible terms from Darwin Core (recognizing that a physical entity is not a subclass of the more epistemological concept of an Occurrence but is more aligned with that of a MaterialSample)

Material Sample

Occurrence

cgendreau commented 3 years ago

occurrenceID will be the uuid assigned by the server. Since materialSampleID is equivalent for us the same value will be used for both.

recordNumber is already captured at the collecting-event level.

dshorthouse commented 3 years ago

sex and lifeStage are nuisances because they play against individualCount. There are plenty of instances where the catalogued object may contain many individuals of numerous sexes and life stages (eg single vial of spiders with 5 adult males, 3 immature males, 2 adult females, 1 gravid female, 1 female with affixed egg sac, etc.) from a single collecting event. These two terms might be better accommodated with managed attributes, perhaps then populating dynamicProperties when published to GBIF.

dshorthouse commented 3 years ago

occurrenceID will be the uuid assigned by the server. Since materialSampleID is equivalent for us the same value will be used for both.

I'm not sure that this is quite true to the intent of Darwin Core because an Occurrence is a conceptual blend of an organism and an event whereas a MaterialSample does not require an event. But, because this distinction is not operationally relevant to us at the moment, we can run with this approach for simplicity-sake for now. That said, our notion of a Physical Entity linked to a collecting event that might then spawn into multiple catalogued objects would necessitate an single occurrenceID for the parent and multiple, different materialSampleID for the children.

cgendreau commented 3 years ago

That said, our notion of a Physical Entity linked to a collecting event that might then spawn into multiple catalogued objects would necessitate an single occurrenceID for the parent and multiple, different materialSampleID for the children.

ah yes, that would be possible but our physical entity id is actually closed to materialSampleID right ?

dshorthouse commented 3 years ago

ah yes, that would be possible but our physical entity id is actually closed to materialSampleID right ?

It is indeed. The problem is that if we follow DwC to the letter, the Occurrence class is the one with the catalogNumber and otherCatalogNumbers properties. Here's a good example where it's wise not to follow DwC to the letter. We need to say, "Not our problem" in this instance and do what's relevant to our needs. In our case, catalogNumber and otherCatalogNumbers has nothing to do with the Occurrence but everything to do with the MaterialSample. But unfortunately, GBIF will be using occurrenceID to assert that a PhysicalSpecimen is the same digital object from one republication event to the next & does nothing with materialSampleID to make such assertions.

jmacklin commented 3 years ago

We need to say, "Not our problem" in this instance and do what's relevant to our needs. In our case, catalogNumber and otherCatalogNumbers has nothing to do with the Occurrence but everything to do with the MaterialSample.

I completely agree. We will figure out how best to map to DwC just like everyone else has to... or better, we will suggest changes to it that enable a more transparent exchange. Some of this is already in flight through the current efforts to map MIxS to DwC with influence from GGBN. Some of these issues are being raised as samples are "core" to them. I have been following but not commenting to this point...

ssbilkhu commented 3 years ago

From the SeqDB perspective, we believe "Substrate Type"

michellelocke commented 3 years ago

Would (what the CNC calls) Preparation go here? This is how the initial specimen is prepared and stored. Some examples of this are: pinned, pinned: point, slide: PVA, ethyl alcohol (95%); envelope, capsule. A preparation process really hasn't been done to the Cataloged Object but this is what the initial Cataloged Object is. But maybe you still see this as part of the Preparation Processes information? I'm still not clear on this.

michellelocke commented 3 years ago

Specimen ID, other ID, sex, life stage and # specimen per record are all fields that CNC uses that look like they will fit with the DwC fields mentioned above. Presumably # specimen per record would equal individuaCount. To me, organismQuantity and oragnismQuantityType would be better suited for things that aren't specimens, so perhaps vials, soil samples, etc. Or do you do away with individualCount and organismQuantity/Type covers it all, as in their example with the Type being individuals. I could see this being a more encompassing field for our varied collections.

michellelocke commented 3 years ago

sex and lifeStage are nuisances because they play against individualCount. There are plenty of instances where the catalogued object may contain many individuals of numerous sexes and life stages (eg single vial of spiders with 5 adult males, 3 immature males, 2 adult females, 1 gravid female, 1 female with affixed egg sac, etc.) from a single collecting event. These two terms might be better accommodated with managed attributes, perhaps then populating dynamicProperties when published to GBIF.

To deal with some of these we use male+female in sex to show that it is mixed (or just male, female or intersex). For life stage we have categories like adult, adult+puparium, adult+egg sac, juveniles+egg sac, etc. to account for a mix of life stages. Not the cleanest way to do it for sure. I could definitely see this being improved upon from what we currently use.

Food for though: eBird has a chart for dealing with life stages vs sex. This is a nice way to do it but I could see it getting complicated if we get into the nitty gritty of accounting for the different names of life stages within different groups of insects (ex: larva, nymph, juvenile, immature, subadult or pupa, puparium, cocoon, chrysalis). ebird_capture

michellelocke commented 3 years ago

We have this odd little field called "CNC specimens not recorded". This is to account for other specimens of a species with the exact same collecting event data but have not been digitized. So if this field says "27" that would mean there are 27 more specimens in the collection that have the exact same locality, same date, same collector, etc. and at the time they were not digitized. This was to save time. As a matter of practice, I do not believe we are routinely using this field, but it was used when we started digitizing to save time (just digitizing one specimen of each species per collecting event). This seems like a field to go here.

michellelocke commented 3 years ago

One last comment. What about physical location of the specimen. The CNC has a Storage field, but we don't regularly use it. I know it has been talked about this being very important for some of the collections, especially for those that need to report where specimens/samples are. I feel like physical storage location of the Physical Entity/Catalogued Object would belong here.

dshorthouse commented 3 years ago

Specimen ID, other ID, sex, life stage and # specimen per record are all fields that CNC uses that look like they will fit with the DwC fields mentioned above. Presumably # specimen per record would equal individuaCount. To me, organismQuantity and oragnismQuantityType would be better suited for things that aren't specimens, so perhaps vials, soil samples, etc. Or do you do away with individualCount and organismQuantity/Type covers it all, as in their example with the Type being individuals. I could see this being a more encompassing field for our varied collections.

Good questions. The terms organismQuantity and organismQuantityType are DarwinCore-speak, meant to accommodate a more philosophical assemblage of a taxonomically-homogeneous entity. That makes sense when we're talking about observations (eg a named wolf pack, a pod of killer whales, a herd of caribou, a colony of bacteria), but is overkill for a catalogued specimen. But...could these be more useful than individualCount for the broader concept of physical entities (i.e. field samples)? Or is individualCount just as useful and informative?

dshorthouse commented 3 years ago

One last comment. What about physical location of the specimen. The CNC has a Storage field, but we don't regularly use it. I know it has been talked about this being very important for some of the collections, especially for those that need to report where specimens/samples are. I feel like physical storage location of the Physical Entity/Catalogued Object would belong here.

That'll be one of the next of 2 major topics after we tackle physical entities and preparation processes. I'd consider storage location as something we'd link to rather than as core fields here because there are additional, independent properties about storage that are vital for conservation (eg humidity, temperature, lighting).

heathercole commented 3 years ago

identifiers: primary; legacy; additional Tags/Flags eg. #90 Project; a field to identify records resulting from particular projects/funding/protocols (eg. BioMob - need to report on how many records in the database were generated from that project) Specimen description: (text) (eg. Abundant. Flowers yellow.) Habitat: (text) Muddy shore of stream | In pine forest with _Genus _species__ Related Specimens; eg. on same sheet as ##### | part of ###### (should be linked to related records) is cultivated; (checkbox) this could potentially go with identifications, but isn't really about the ID, DAO maintains a LOT of cultivated material and it is very important to differentiate between 'natural' and 'cultivated' specimens. In Specify, this is just a check box, used to indicate "yes; is cultivated" Phenology: pick-list to identify life-stage of specimen (eg. flowering, fruit, vegetative, floricane Cataloger comments: (text) a field for notes from cataloger (would not be exported) [Additional] Remarks: (text) there is often additional information on a label that is relevant to capture, but not worth each time of info having it's own field. Sampled: with increasing requests for destructive sampling, managers must be able to track whether sampling has occurred and how many times (this may relate to 'process'). Some specific specimens are highly requested. (would perhaps not be exported) Management Notes: (text) (eg. this specimen could not be found) (would not be exported)

OTHER at the specimen "record" level, it would be optimal to:

michellelocke commented 3 years ago

I would like to argue that Habitat is part of the Collecting Event and needs to be added there. This is the initial habitat that the specimen was collected in and describes the location. It would pertain to everything collected at that place and time (a collecting event). This is not part of the Physical Entity. Perhaps a "habitat" that the specimen is cultured in would fit here, but this is not the same as the initial description of where it was collected from.

heathercole commented 3 years ago

It may be relevant to split "site description" and "habitat". If a botanist is wandering around a "site"; different specimens may be collected from very different "habitats" that relate to biology of the particular specimen and not every specimen collected at the larger "site" relevant to a "description of location". Site description: pine forest with a stream running through it. mixed canopy. Habitat for Specimen1: in mud next to the stream, open canopy Habitat for specimen2; growing on a decaying log in the shade

(or insect found on a particular species of plant); isn't really "host" or "substrate", but may be relevant info

michellelocke commented 3 years ago

We have a similar thing called Habitat and Microhabitat but they are both related to the collecting event. If I collected a sample of mites the Habitat might be deciduous maple-beech forest but the Microhabitat might be soil under rotten log. Both would be related to that single collecting event. If we collected another sample in that same forest the Habitat would be the same but the Microhabitat might be under bark. This would be a different collecting event that the previous one even though the rest of their collecting event info is the same. Habitat doesn't have anything to do with the Physical Entity/Catalogued Object the way I view it.

michellelocke commented 3 years ago

@heathercole mentioned a few different types of what I would call Notes fields (Cataloger comments, [Additional] Remarks, Management Notes). The CNC currently has one Notes field where we stick everything that doesn't fit elsewhere, whether it be other label info, remarks about the specimen, remarks about location or about the georeference. I'm not opposed to splitting Notes fields (we've already created some in the Collecting Event) but we just need very clear names for those fields and descriptions that tell the user exactly what goes where.

banchinic commented 3 years ago

We have a similar thing called Habitat and Microhabitat but they are both related to the collecting event. If I collected a sample of mites the Habitat might be deciduous maple-beech forest but the Microhabitat might be soil under rotten log. Both would be related to that single collecting event. If we collected another sample in that same forest the Habitat would be the same but the Microhabitat might be under bark. This would be a different collecting event that the previous one even though the rest of their collecting event info is the same. Habitat doesn't have anything to do with the Physical Entity/Catalogued Object the way I view it.

I agree with Michelle that habitat should be related to the Collecting Event and not the physical entity itself. Especially in the case of a living collection it would be confusing to associate the habitat from the site with the physical entity (I don't grow my fungi in the same habitat it was collected from).

rintoult commented 3 years ago

For the CCFC here are the field names from SeqDB that would be related directly to the Specimen=Physical Entity=Catalogued Object These are laid out in this way "name of field in background/code" = "Displayed Field Name" | Definition of field

There are two sets of data in SeqDB which I think are associated solely with the Catalogued Objects that we might traditionally call Specimens. These are grouped into " specimen properties" and "fungal info". Currently there are also 4 other categories of data that are currently only associated with the Specimen Record, #-- taxonomy --, #-- host properties --, #-- collectionInfo properties --, #-- identification properties --

This is not the complete list of fields from the 2 categories which I think should be represented, I have reviewed and removed those fields that might be associated with another module or data capture place. I did include fields that we have never used just in case they might work for someone else. Also these names and descriptions are in no way perfect but definitely give us a starting point and potential scope.

-- specimen properties -- |  

-- | -- specimen.processId = Process ID | This field is used for the BOLD Process ID specimen.number = Specimen Identifier | Unique number representing a specimen in a collection specimen.subId = Sub ID | This field is used for extra characters after the specimen number to capture identity another entry in the database specimen.otherIds = Other IDs | Other identification numbers that were used in the past, usually represent other collection's IDs specimen.description = Description | Description of state of specimen, ie herbarium, preserved in alcohol specimen.dateReceived = Date Received (yyyy-mm-dd) | Date specimen received to collection specimen.cellType = Cell type | Type of cells represented in specimen. specimen.tissue = Tissue type | Type of tissue represented in specimen specimen.notes = Notes | Notes for the Specimen specimen.extraInfo = Extra Info | Other details specimen.voucher = Voucher Type |  

-- fungalInfo properties |  

-- | -- fungalInfo.DAOMNumber = DAOM Number | DAOM number of specimen applied at herbarium or CCFC fungalInfo.DAOMGroup = DAOM Group | Grouping in DAOM herbarium determines where a herbarium specimen is stored. fungalInfo.CCFCNumber = CCFC Number | DAOM number of specimen applied at herbarium or CCFC fungalInfo.savedAs = Saved As | How specimen is being stored, Culture, Herbarium Specimen or Both fungalInfo.toCCFC = Sent to CCFC (yyyy-mm-dd) | Date the specimen was submitted to Culture Collection fungalInfo.toDAOM = Sent to DAOM (yyyy-mm-dd) | Date the specimen was submitted to the herbarium fungalInfo.isolatedBy = Isolated By | Person who isolated a culture of the specimen fungalInfo.isolationDate = Isolation Date (yyyy-mm-dd) | Date the culture was isoalted fungalInfo.culture = Cultured | True or False ranking, True is a living culture, False is no culture available fungalInfo.notes = Notes | Notes for the Fungal Information fungalInfo.receivedFrom = Received From | Person or collection from whom the specimen was received

Beyond these fields there is one other I am thinking about today: History of material - sometimes when we have material that is stored in multiple collections around the world we can wonder about "ownership" and in the past sometimes material has been deposited to the CCFC when permission was not granted by the true "owner" and the only place we have any history of the movement of material might be something like this "Scientist- CBS- NRRL- DAOMC-CCFC", or "Scientist- DAOMC- CBS-ATCC". Would like to be able to track that in the future somehow and also capture the data we already have on history of specimens.

If we do decide that host will be included at this level I can provide the host fields.

Tara

banchinic commented 3 years ago

Our collection has no use for the proposed fields: individualCount, organismQuantity, organismQuantityType, sex, lifeStage or dynamicProperties.

We have a lot of fields for Physical Entities since we are a living collection. Those are the ones I can think of right now but I might be missing some. It is SO important for managing this collection to have the observation fields!

Specimen Replicate fields / Physical Entities fields

1. Specimen description fields : identifiers (name/ version) DAOM number (number) Other IDs (IDs used by other institutions for example) Tags/Flags Parent Specimen or Related Specimens? (name/ version + link to it) Date specimen was created Alive or Preserved Specimen (check boxes? I need to easily filter between the 2) State (text) Pot Culture, Petri plate, Vial, Soil in plastic bag, etc… Medium (text) soil, name of growth medium, etc… Host (text) I just need to record what the host is NOT create separate physical entities for host [Additional] Remarks: (text)

2. Observation fields: Date of observation Observed by (name of person) Host (Genus and Species of host) State of host (Alive or Dead) Pure culture ( checkbox Y/ N) Spores abundance (many spores, a few spores, no spores) Microscopic slide # (number) Material sample description (text) Taxonomy notes / Microscopic observations link to the new material sample that was created from this one DNA extraction # (number) and link to the DNA/ Sequencing module where protocol, reagents, sequences, etc are stored?

3. Storage information

rintoult commented 3 years ago

Here are the fields for Mixed Specimens from SeqDB as described above:

-- Mixed Specimen properties -- |  

-- | -- mixedSpecimen.mixedSpecimenNumber = Mixed Specimen Number | ID number or name which relates to the mixed specimen mixedSpecimen.CFIAPermitNumber = CFIA Permit Number | ID number of CFIA permit mixedSpecimen.substrateType = Substrate Type | Description of source of mixed specimen ie soil, water, air mixedSpecimen.associatedPlants = Associated Plants | Description of crop or planting in the substrate mixedSpecimen.fungiIsolated = Fungi Isolated | Fungal strains that were isolated by culture. mixedSpecimen.notes = Notes | Notes for the Mixed Specimen mixedSpecimen.sampleDestroyed = Sample Destroyed (yyyy-mm-dd) | The date on which the mixed specimen was destroyed.

I have not worked often with Mixed Specimens and am no expert but again these may be a conversation starter.

rintoult commented 3 years ago

I mentioned this on issue 166 as well - permits and ownership documents will need to be directly accessible through the catalogued objects as well. Either by clear linkages or a duplicated relationship to a Collection Permit from a Collection Event. Or directly like an import permit.

banchinic commented 3 years ago

Here are the fields for Mixed Specimens from SeqDB as described above:

-- Mixed Specimen properties --  

mixedSpecimen.mixedSpecimenNumber = Mixed Specimen Number ID number or name which relates to the mixed specimen mixedSpecimen.CFIAPermitNumber = CFIA Permit Number ID number of CFIA permit mixedSpecimen.substrateType = Substrate Type Description of source of mixed specimen ie soil, water, air mixedSpecimen.associatedPlants = Associated Plants Description of crop or planting in the substrate mixedSpecimen.fungiIsolated = Fungi Isolated Fungal strains that were isolated by culture. mixedSpecimen.notes = Notes Notes for the Mixed Specimen mixedSpecimen.sampleDestroyed = Sample Destroyed (yyyy-mm-dd) The date on which the mixed specimen was destroyed. I have not worked often with Mixed Specimens and am no expert but again these may be a conversation starter.

I use MixedSpecimen in SeqDB a lot. However, I think in the way I use it, it could be replaced entirely by the Collecting Event? Anyway that's how I saw it in my head. In SeqDB we record all the collecting info. in Mixed Specimen since it's our only way to link multiple specimens together.

rintoult commented 3 years ago

Hey Claudia, I am not sure collecting event could completely replace it - since the collecting event is only data and doesn't provide a catalogued object that can have a location. So for your soil bags I think you might still want a Mixed Specimen type object. Maybe? Tara

banchinic commented 3 years ago

Hey Claudia, I am not sure collecting event could completely replace it - since the collecting event is only data and doesn't provide a catalogued object that can have a location. So for your soil bags I think you might still want a Mixed Specimen type object. Maybe? Tara

Hi Tara,

Our soil bags are simply our preserved Specimen Replicates so as long as we have a storage information in the Physical Entities or Catalogued Objects I'm fine (I'm starting to be confused between the 2 terms).

However, we do need storage information higher up you are right for where we keep our original soil samples. It could simply be a storage information in or linked to the collecting event. I'm assuming other collections might also have use for this? We have not talked storage at all yet so I'm not sure. We could also create an ''Original'' Physical Entity for the soil samples and link all the other derived Physical Entities to it.

shannonasencio commented 3 years ago

A lot of the herbaria's requirements have been covered here already, so thanks all! Here are some others I can think of:

rintoult commented 3 years ago

I realised overnight that we actually don't have physical entities for what we call specimens in our current workflow - for us the specimen record is data only. a connnection point for all of the derived entities but we have nothing in real space that is a specimen, just copies of it, dna derived from it, sequences typifying it. It has the collecting information, and agents catalogued here but is not a physical entity. So my need for fields still stands but in the way we have handled data in the past this would not be a physical entity nor a catalogued object. le sigh.

cgendreau commented 3 years ago

This ticket went in all directions so I will close it but from what I can see most of the fields that are not already present can be handled by managed attributes on CollectingEvent or MaterialSample with some exceptions: