gbif / doc-camera-trap-guide

This guide provides recommendations for camera trap data management and publication to GBIF.
https://doi.org/10.35035/doc-0qzp-2x37
Other
0 stars 1 forks source link

3.1 What are camera trap data #30

Closed dhobern closed 12 months ago

dhobern commented 1 year ago

I think there is a fifth type of camera trap data that may be worth separating out from Observations.

In the context of this section, "Observations" seems to equate to detections of a species that can then be turned into species occurrence records. This is generally the first biologically useful output from processing the images.

However, depending on the characteristics of the survey and the goals of the researchers, it may also be possible to produce useful derived data that may be more amenable to statistical analysis. The most obvious may be to derive sample events from the observations. This is not always scientifically sensible, but processing may deliver e.g. detection count, estimated individual count, percentage of marked individuals in detections, even biomass estimates.

I would recommend that best practice would be share both the primary Observations and any derived SampleEvent or other data and to document in metadata how these are related. I know that section 4.3.1 discusses some complications, but this does not need to be an either/or matter. DwC-A is a bottleneck, but considering the structure of the data may assist with longer-term FAIRness.

Depending on the process, it may also be sensible to consider modeling the image segments that relate to the detected organisms. This may be particularly useful if one of the goals is to develop a training dataset for future ML automation of identifications.

peterdesmet commented 1 year ago

The most obvious may be to derive sample events from the observations. This is not always scientifically sensible, but processing may deliver e.g. detection count, estimated individual count, percentage of marked individuals in detections, even biomass estimates.

Do I understand correctly that these are analysis data derived/calculated from the camera trap data? And if so, that the camera trap data should contain all the elements to make that calculation possible?

I wonder then:

  1. If Camtrap DP currently contains all the necessary fields to make such a calculation possible? It aims to.
  2. If the calculated data should be published to GBIF? Or if data published to GBIF should be the underpinning data make that possible (i.e. more agnostic of the analysis)?
  3. If publishing such data to GBIF is a good idea (e.g. in addition to the primary data), how it could be expressed as Darwin Core or the new GBIF model?
  4. Would it be a good idea if GBIF calculates and displays some derived data automatically?

Your opinion on those questions can help to steer what we should recommend in the guide.

Note: image segments can be expressed in Camtrap DP (bboxX, bboxY, bboxWidth, bboxHeight), we should mention that (see #31).

dhobern commented 1 year ago

I almost wasted your time with much more extensive rambling comments, but decided to keep things brief. I wanted to avoid suggesting stuff that may make your document unhelpful to the target audience, but here is how I see things these days (partly based on work I'm doing to support plant phenomics data pipelines). In particular, I've learned a lot from how some of the EU projects use the ISA Tools framework (https://isa-tools.org/index.html). Their use is more informative than the documentation or the poorly supported tools.

We can think of a camera trap program as an Investigation with each site being a Study for which we collect Assays. The Investigation is little more than a container for Studies.

A study uses sensors (in our case cameras) to perform an Assay on the environment or feature of interest (a plant, a portion of habitat in front of a camera, etc.).

So our primary Assay is to populate a dataframe representing image acquisition events for the scene under study. For a camera trap, this Assay collects a set of images from the scene. We can document (in metadata) the methods used and the feature of interest (pretty much a location and orientation combined with the threshold trigger metadata).

Now we have a set of images. Our second Assay may be to select objects of interest from the image, or it may directly be to tag the image with a species identification. In both cases, we are applying an Assay to a digital object - it is our feature of interest - and derived new features of interest as new digital objects. This continues as we add processing steps. So we might have a pipeline line this:

  1. Scene -> image capture -> Images
  2. Images -> CV segmentation -> Image Segments
  3. Image Segments -> human expert OR ML model -> Occurrence Records (possibly with very narrow time intervals)
  4. Occurrence Records -> aggregation by time period -> Sample Events
  5. Occurrence Records -> CV recognition of individual zebra etc. -> Recapture Events
  6. Sample Events + Recapture Events -> population model -> Population Estimate

This goes far beyond what you should cover in this document, but fully FAIR data would document each of these Assays/steps and allow a user or machine to understand the source and character of the Sample Events or the Population Estimate.

I hope such complexity could be recorded in the new data model. But I think it makes sense for us in advance to be considering how we can ensure we have all the information our future selves would love to have in relation to each study. Machine observations collected using standard protocols could be so valuable, but their interoperability may be limited by sloppy choices now.

Thanks for the note on bbox.

dhobern commented 1 year ago

I'm still reading some of the more detailed sections of the document and will review the Camera DP aspects to ensure it all makes sense with my use cases in mind.

peterdesmet commented 1 year ago

Here are a number of camera-based monitoring methods I heard mentioning (in the last weeks) that we did not consider when writing this guide:

  1. Fixed-location insect cameras (@dhobern)
  2. Fixed-location underwater cameras (Dmitry Schigel)
  3. Moving drone (top-down) vegetation imaging (Dmitry Schigel)
  4. Moving underwater robot (@albenson-usgs)
  5. Moving "Street View" like imaging of roadside/river vegetation (@timadriaens)

I think we should update the guide (title, introduction) to mention which ones are in and out of scope. Personally I would keep the scope to fixed-location camera traps, since those can likely be supported by Camtrap DP by adding a few metadata fields to the deployments (e.g. depth, ping @kbubnicki). Supporting moving methods is more complicated, since a deployment in Camtrap DP assumes a fixed location and would get a quite a lot more complicated if it didn't.

peterdesmet commented 12 months ago

@dhobern this issue discusses two elements:

  1. How other information than species-observations can be derived from camera traps. We haven't specifically added that to this guide, but these are applications we want to support with Camtrap DP. If Camtrap DP doesn't (while it should), I suggest you open an issue at https://github.com/tdwg/camtrap-dp/issues
  2. Certain camera trap type that were not mentioned while they should (like underwater and insect). This has been addressed as part of #28
  3. Certain camera types that were not mentioned and are considered out of scope (e.g. moving cameras). This is not mentioned in the "What this guide is not about" 56b908c

    I'll close this issue, but we can open it again if you think something should be addressed still.