NERC-CEH / plankton_ml

A project for image processing and analysis pipelines for plankton sampling
GNU General Public License v3.0
0 stars 1 forks source link

Minimal metadata for unlabelled images #4

Open metazool opened 1 month ago

metazool commented 1 month ago

The current catalogue record rewrites the CSV index of tagged images to add an absolute path to the location in s3 - but for experimenting with unsupervised approaches we want an index of the whole set.

Don't have more to go on than filename and perhaps dimensions, but one to revisit later with the opportunity to get some spatio-temporal metadata out of the FlowCam instrument

Edit: adding these notes which were originally an issue on the internal project, for consultation with local experts

For pipeline workflows it makes sense to have a standards oriented interface to the image collections, there are different modern options and a

Marine sampling standards background

There's lots of work on plankton sample data sharing in the marine world, one ideal of the project originators is to establish an equivalent service for freshwater ecosystems. There's a well worked out body of standards but a lot of it has a kind of pre-internet feel to it, with semi-manual workflows for data linkage and cleaning, e.g. https://manual.obis.org/name_matching.html#taxon-matching-workflow

https://hal.science/hal-03958791/document

"Establishing Plankton Imagery Dataflows Towards International Biodiversity Data Aggregators"

"We developed recommendations for plankton imagery data management, which can promote the ability to make these datasets as FAIR". This is a very high-level description of workflow without automation / implementation specifics.

The aggregation goes through here https://ipt.vliz.be/eurobis/ which is oriented to marine ecosystems.

https://github.com/EMODnet/EMODnetBiocheck - this R-based tool is used for quality control: "It helps users to Quality Control their (marine) biological datasets ... the analysis reaches its full potential using an IPT resource with OBIS-ENV data format", a heavyweight looking data model - https://manual.obis.org/formatting.html

References

DylanCarbone commented 1 month ago

Hi Jo, @metazool

As I mentioned in my email the following guides and documents may be of interest to your work:

The TDWG guide - An older document listing established metadata standards and based on those standards terminology important for the monitoring of insects. Camtrap DP - A richer metadata standard tailored for methods of monitoring that captures images. This can describe images captured in the lab under flow cytometry but there will be certain fields that will not be relevant to flow cytometry methods A guide to camera trap surveying - If you are considering metadata standards for the purpose of publication to GBIF, this has some nice discussions on the event-occurence structure and the limits of Darwin Core star schema structure. This is under the section 4.3.