Open nfshoobs opened 1 year ago
Additional info to address questions from pitch:
One major potential of this research is that the ending dataset would be a large annotated set of species images that are expertly identified, which can be then used to train a model that identifies new images automatically. This would be incredibly useful for conservation and management, as "ability to identify mussel species" is a rare skill and there is high demand for ID expertise from state and federal wildlife agencies.
I'm bringing an image dataset (about 8,000 images) that contain two different angle views of about a half a million specimens of North American freshwater bivalve shells. Freshwater bivalves are the most endangered animals on the planet, and many of these species have suffered serious population level declines and range contractions over the course of the last century. The OSU Museum of Biological Diversity Mollusk Division houses the largest freshwater bivalve collection in the world, and furthermore contains about a quarter of all known museum specimens of endangered, threatened, and extinct species. We have specimens not only from the majority of watersheds in North America, but in many cases from the same sites collected at multiple different time periods. This makes the OSUM Mollusk Division's collection a very powerful resource to ask questions about continental-scale changes in phenotype correlated with anthropogenic disturbance (dams, pollution) and climate change.
The dataset consists of images of whole drawers of specimens from two angles -- top down, and 45º. The drawers contain individual boxes of specimens called "lots". 1 lot is the set of all the specimens of a species collected at a single place and time. All lots in the collection have a unique numeric catalogue number which is printed on the top right corner of a cardstock label in the box. All images were taken using the same lighting setup and contain an Calibrite ColorChecker Nano and a QP Card QP101 Calibration Card with mm scale bar.
(A sample from the dataset can be downloaded here)
My goal is to get help to use CV / ML methods to:
I would definitely be interested in testing some hypotheses about the distribution of different morphological traits and color patterns using this dataset. It would be the largest dataset of its kind in existence for mollusks. Please reach out if you're interested in collaborating on some or all of this! -Nate
Originally posted by @nfshoobs in https://github.com/Imageomics/Image-Datapalooza-2023/issues/3#issuecomment-1676450117