Open metazool opened 4 months ago
Hi Jo, @metazool
As I mentioned in my email the following guides and documents may be of interest to your work:
The TDWG guide - An older document listing established metadata standards and based on those standards terminology important for the monitoring of insects. Camtrap DP - A richer metadata standard tailored for methods of monitoring that captures images. This can describe images captured in the lab under flow cytometry but there will be certain fields that will not be relevant to flow cytometry methods A guide to camera trap surveying - If you are considering metadata standards for the purpose of publication to GBIF, this has some nice discussions on the event-occurence structure and the limits of Darwin Core star schema structure. This is under the section 4.3.
The current catalogue record rewrites the CSV index of tagged images to add an absolute path to the location in s3 - but for experimenting with unsupervised approaches we want an index of the whole set.
Don't have more to go on than filename and perhaps dimensions, but one to revisit later with the opportunity to get some spatio-temporal metadata out of the FlowCam instrument
Edit: adding these notes which were originally an issue on the internal project, for consultation with local experts
For pipeline workflows it makes sense to have a standards oriented interface to the image collections, there are different modern options and a
Marine sampling standards background
There's lots of work on plankton sample data sharing in the marine world, one ideal of the project originators is to establish an equivalent service for freshwater ecosystems. There's a well worked out body of standards but a lot of it has a kind of pre-internet feel to it, with semi-manual workflows for data linkage and cleaning, e.g. https://manual.obis.org/name_matching.html#taxon-matching-workflow
https://hal.science/hal-03958791/document
"Establishing Plankton Imagery Dataflows Towards International Biodiversity Data Aggregators"
"We developed recommendations for plankton imagery data management, which can promote the ability to make these datasets as FAIR". This is a very high-level description of workflow without automation / implementation specifics.
The aggregation goes through here https://ipt.vliz.be/eurobis/ which is oriented to marine ecosystems.
https://github.com/EMODnet/EMODnetBiocheck - this R-based tool is used for quality control: "It helps users to Quality Control their (marine) biological datasets ... the analysis reaches its full potential using an IPT resource with OBIS-ENV data format", a heavyweight looking data model - https://manual.obis.org/formatting.html
References
"Best practices and recommendations for plankton imaging data management: Ensuring effective data flow towards European data infrastructures."
Audiovisual Core, formerly Audubon Core, "metadata for biodiversity multimedia resources and collections."
"Discovery and Publishing of Primary Biodiversity Data associated with Multimedia Resources: The Audubon Core Strategies and Approaches."
Biodiversity Information Standards / GBIF
GBIF API, Occurences
GBIF Freshwater Data Publishing Guide - assumes a lot of metadata already available, how can this be done more incrementally