Closed dhobern closed 12 months ago
The most obvious may be to derive sample events from the observations. This is not always scientifically sensible, but processing may deliver e.g. detection count, estimated individual count, percentage of marked individuals in detections, even biomass estimates.
Do I understand correctly that these are analysis data derived/calculated from the camera trap data? And if so, that the camera trap data should contain all the elements to make that calculation possible?
I wonder then:
Your opinion on those questions can help to steer what we should recommend in the guide.
Note: image segments can be expressed in Camtrap DP (bboxX, bboxY, bboxWidth, bboxHeight), we should mention that (see #31).
I almost wasted your time with much more extensive rambling comments, but decided to keep things brief. I wanted to avoid suggesting stuff that may make your document unhelpful to the target audience, but here is how I see things these days (partly based on work I'm doing to support plant phenomics data pipelines). In particular, I've learned a lot from how some of the EU projects use the ISA Tools framework (https://isa-tools.org/index.html). Their use is more informative than the documentation or the poorly supported tools.
We can think of a camera trap program as an Investigation with each site being a Study for which we collect Assays. The Investigation is little more than a container for Studies.
A study uses sensors (in our case cameras) to perform an Assay on the environment or feature of interest (a plant, a portion of habitat in front of a camera, etc.).
So our primary Assay is to populate a dataframe representing image acquisition events for the scene under study. For a camera trap, this Assay collects a set of images from the scene. We can document (in metadata) the methods used and the feature of interest (pretty much a location and orientation combined with the threshold trigger metadata).
Now we have a set of images. Our second Assay may be to select objects of interest from the image, or it may directly be to tag the image with a species identification. In both cases, we are applying an Assay to a digital object - it is our feature of interest - and derived new features of interest as new digital objects. This continues as we add processing steps. So we might have a pipeline line this:
This goes far beyond what you should cover in this document, but fully FAIR data would document each of these Assays/steps and allow a user or machine to understand the source and character of the Sample Events or the Population Estimate.
I hope such complexity could be recorded in the new data model. But I think it makes sense for us in advance to be considering how we can ensure we have all the information our future selves would love to have in relation to each study. Machine observations collected using standard protocols could be so valuable, but their interoperability may be limited by sloppy choices now.
Thanks for the note on bbox.
I'm still reading some of the more detailed sections of the document and will review the Camera DP aspects to ensure it all makes sense with my use cases in mind.
Here are a number of camera-based monitoring methods I heard mentioning (in the last weeks) that we did not consider when writing this guide:
I think we should update the guide (title, introduction) to mention which ones are in and out of scope. Personally I would keep the scope to fixed-location camera traps, since those can likely be supported by Camtrap DP by adding a few metadata fields to the deployments (e.g. depth, ping @kbubnicki). Supporting moving methods is more complicated, since a deployment in Camtrap DP assumes a fixed location and would get a quite a lot more complicated if it didn't.
@dhobern this issue discusses two elements:
Certain camera types that were not mentioned and are considered out of scope (e.g. moving cameras). This is not mentioned in the "What this guide is not about" 56b908c
I'll close this issue, but we can open it again if you think something should be addressed still.
I think there is a fifth type of camera trap data that may be worth separating out from Observations.
In the context of this section, "Observations" seems to equate to detections of a species that can then be turned into species occurrence records. This is generally the first biologically useful output from processing the images.
However, depending on the characteristics of the survey and the goals of the researchers, it may also be possible to produce useful derived data that may be more amenable to statistical analysis. The most obvious may be to derive sample events from the observations. This is not always scientifically sensible, but processing may deliver e.g. detection count, estimated individual count, percentage of marked individuals in detections, even biomass estimates.
I would recommend that best practice would be share both the primary Observations and any derived SampleEvent or other data and to document in metadata how these are related. I know that section 4.3.1 discusses some complications, but this does not need to be an either/or matter. DwC-A is a bottleneck, but considering the structure of the data may assist with longer-term FAIRness.
Depending on the process, it may also be sensible to consider modeling the image segments that relate to the detected organisms. This may be particularly useful if one of the goals is to develop a training dataset for future ML automation of identifications.