Imageomics / Image-Datapalooza-2023

Repository for the Image Datapalooza 2023 event held at OSU in August 2023.
Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

What data are you bringing? #3

Open DiamondKMG opened 11 months ago

DiamondKMG commented 11 months ago

If you have a data set that you are planning to focus on at Image Datapalooza, could you drop a response here explaining a little about the data set and what questions you'd like to be able to answer?

jennamk14 commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.
nickynicolson commented 11 months ago

I am currently supervising a couple of student projects that are building datasests from our Kew herbarium specimen digitisation project. Ideas for their areas of focus - for example, covering the building of datasets that display specific traits, and the integration of specimen images with other data (illustrations drawn from the specimen, scientific descriptions of the traits displayed by the specimen) are here: https://github.com/orgs/KewBridge/discussions We also have some background info on the project here: https://github.com/KewBridge

aperrault commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.

I am quite interested in the second question and perhaps the broader question of how to optimize flight trajectories for identification.

Callithrix-omics commented 11 months ago

I have a large set of photographs of small Brazilian primate species (marmosets genus Callithrix) which hybridize anthropogenically. I also have genomic data on these hybrid to confirm their actual ancestral species. The photographs are facial as well as of various portions of the body and consist of reference ancestral species as well as various types of hybrids from various ancestral species combinations. There are various important conservation, ecological, and population management implications for properly identifying the likely ancestral species of hybrids based on phenotypic and genotypic information. The anthropogenic hybrids are considered exotic and invasive from where they are in the wild.

I am interested in learning about and using computer vision techniques to build AI models to identify the likely ancestral species of a given hybrid, as well as in the long term developing an easily accessible app incorporating such models that can be used by clinical and biological staff at animal triage centers in Brazil where apprehended or captured anthropogenic hybrids are managed and need to be identified properly.

balhoff commented 11 months ago

I'd like to work with the Plazi taxonomic treatments dataset, which includes many images with associated anatomical descriptions. However, the images typically contain several subpanels within each, and likewise the text combines the descriptions for all the sub panels. I'm hoping to separate these into correctly grouped images and descriptions, and further to link the text to taxonomic names and anatomy ontology concepts.

douglasmbura commented 11 months ago

Hi! I am particularly interested in how we can incorporate knowledge from local indigenous communities to create more contextually relevant image analysis techniques. Indigenous Communities possess deep knowledge about their local ecosystems, including wildlife behavior, migration patterns, and habitat preferences. Any Insights?

DiamondKMG commented 11 months ago

@douglasmbura I would be very interested in helping with/ learning more about this. Do you have a data set/ topic in mind? I don't have the data for this context but would love to learn more about how to do this for future projects/ contribute to other projects.

douglasmbura commented 11 months ago

@DiamondKMG Thanks alot for your interest. I work with the GEO Indigenous Alliance and I'm involved in a planned project that seeks to enhance the Samburu Community's resilience to disaster risks associated with famine and draught and reducing human-wildlife conflicts. The Samburu are pastrolist community which live in the semi-arid Northern part Kenya. Traditionally the community has relied on Indigenous knowledge of moving stars, the moon and insitu observations of behaviour of animals and birds to predict the occurrence of potential hazards. The idea therefore is to incorporate this indigenous insights to build a culturally acceptable solution. There's a plan to install camera traps in strategic locations to capture certain animal behaviours. The project is called "Lopa" which means The "Moon" Project. I would be glad to share with you alot more and will truly appreciate your help.

DiamondKMG commented 11 months ago

@DiamondKMG Thanks alot for your interest. I work with the GEO Indigenous Alliance and I'm involved in a planned project that seeks to enhance the Samburu Community's resilience to disaster risks associated with famine and draught and reducing human-wildlife conflicts. The Samburu are pastrolist community which live in the semi-arid Northern part Kenya. Traditionally the community has relied on Indigenous knowledge of moving stars, the moon and insitu observations of behaviour of animals and birds to predict the occurrence of potential hazards. The idea therefore is to incorporate this indigenous insights to build a culturally acceptable solution. There's a plan to install camera traps in strategic locations to capture certain animal behaviours. The project is called "Lopa" which means The "Moon" Project. I would be glad to share with you alot more and will truly appreciate your help.

That sounds so cool! I wonder if there's a way to use an ontology-like system to help with the data collection/ training process. Looking forward to learning more about the project when we are all in person next week! How do you respectfully collect/organize the indigenous knowledge if you don't mind sharing?

douglasmbura commented 11 months ago

@DiamondKMG Sure, there's a proposed way to collect the data in a structured way. I will share with you alot during the workshop. And I would also like to learn the best approaches.

dirtmaxim commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.

I want to add some ideas regarding the drone videos. The dataset contains annotations for the behavior of zebras and giraffes. We have already tested the dataset against several known action recognition models, but it is still interesting how much we can improve the classification results compared to our baseline. The dataset has labels for 3 species: Grevy's zebras, plains zebras, and giraffes. A few questions we can answer during Datapalooza:

  1. Can we identify species by behavior alone?
  2. Is it possible to determine the sex of animals by behavior?
  3. Are there distinct behavioral traits that can help differentiate between subspecies of zebras (Grevy's zebras vs. plains zebras) or different age groups of giraffes?
  4. How does the altitude and speed of the drone impact the quality and accuracy of behavior annotations?
  5. Can we analyze how behavior changes in relation to geographical features, vegetation density, or other environmental factors visible in the drone footage?
hhsieh commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.

I want to add some ideas regarding the drone videos. The dataset contains annotations for the behavior of zebras and giraffes. We have already tested the dataset against several known action recognition models, but it is still interesting how much we can improve the classification results compared to our baseline. The dataset has labels for 3 species: Grevy's zebras, plains zebras, and giraffes. A few questions we can answer during Datapalooza:

  1. Can we identify species by behavior alone?
  2. Is it possible to determine the sex of animals by behavior?
  3. Are there distinct behavioral traits that can help differentiate between subspecies of zebras (Grevy's zebras vs. plains zebras) or different age groups of giraffes?
  4. How does the altitude and speed of the drone impact the quality and accuracy of behavior annotations?
  5. Can we analyze how behavior changes in relation to geographical features, vegetation density, or other environmental factors visible in the drone footage?

Among these questions, I am in particular interested in questions 4 and 5. There could be an extension of question 5 - landscape of fear. How are the geographic features, vegetation density, or other environmental factors associated with the anti-predation behavior (e.g. alert, escape) of the herbivores? Do we see predators in the drone footage?

Were the drone footage taken at the same sites over time? Does fear of predation influence vegetation density and primary productivity?

Also, to reflect on questions 1, 2, and 3 - do the species, sex and subspecies of zebras show different anti-predation behavior and how the behavioral change in relation to fear within an herbivore population influence vegetation density and primary productivity?

There is a trade-off between predation risk and food intake. In response to lower vegetation availability in drought, the herbivores would need to take more risks to maintain a sufficient nutrient intake level. Drought would change the herbivores' anti-predation behavior.

cvstewart commented 11 months ago

My apologies. Re-opening

cvstewart commented 11 months ago

I have a large set of photographs of small Brazilian primate species (marmosets genus Callithrix) which hybridize anthropogenically. I also have genomic data on these hybrid to confirm their actual ancestral species. The photographs are facial as well as of various portions of the body and consist of reference ancestral species as well as various types of hybrids from various ancestral species combinations. There are various important conservation, ecological, and population management implications for properly identifying the likely ancestral species of hybrids based on phenotypic and genotypic information. The anthropogenic hybrids are considered exotic and invasive from where they are in the wild.

I am interested in learning about and using computer vision techniques to build AI models to identify the likely ancestral species of a given hybrid, as well as in the long term developing an easily accessible app incorporating such models that can be used by clinical and biological staff at animal triage centers in Brazil where apprehended or captured anthropogenic hybrids are managed and need to be identified properly.

This sounds interesting and challenging. I can immediately see several potential questions. Primarily I think you are asking for the ability to classify an individual as a hybrid or not from a photograph. And I am thinking that the genomic data serves as the ground truth for training.

  1. Are there degrees of hybridization that you'd like to classifiy?
  2. Can humans do this classification from pictures?
  3. How much data do you have? Images, individuals, etc?
  4. How clean are the data?
  5. Is it worthwhile trying to identify individuals?

I'm hoping there can be a lot of discussion around this to see what folks can do to help

Would you like to be able to answer a simple yes/no question: does this

cvstewart commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.

I want to add some ideas regarding the drone videos. The dataset contains annotations for the behavior of zebras and giraffes. We have already tested the dataset against several known action recognition models, but it is still interesting how much we can improve the classification results compared to our baseline. The dataset has labels for 3 species: Grevy's zebras, plains zebras, and giraffes. A few questions we can answer during Datapalooza:

  1. Can we identify species by behavior alone?
  2. Is it possible to determine the sex of animals by behavior?
  3. Are there distinct behavioral traits that can help differentiate between subspecies of zebras (Grevy's zebras vs. plains zebras) or different age groups of giraffes?
  4. How does the altitude and speed of the drone impact the quality and accuracy of behavior annotations?
  5. Can we analyze how behavior changes in relation to geographical features, vegetation density, or other environmental factors visible in the drone footage?

Among these questions, I am in particular interested in questions 4 and 5. There could be an extension of question 5 - landscape of fear. How are the geographic features, vegetation density, or other environmental factors associated with the anti-predation behavior (e.g. alert, escape) of the herbivores? Do we see predators in the drone footage?

Were the drone footage taken at the same sites over time? Does fear of predation influence vegetation density and primary productivity?

Also, to reflect on questions 1, 2, and 3 - do the species, sex and subspecies of zebras show different anti-predation behavior and how the behavioral change in relation to fear within an herbivore population influence vegetation density and primary productivity?

There is a trade-off between predation risk and food intake. In response to lower vegetation availability in drought, the herbivores would need to take more risks to maintain a sufficient nutrient intake level. Drought would change the herbivores' anti-predation behavior.

All great questions! This whole line of work has tremendous potential.

Temporarily steering the discussion back to the original questions of id and quality, my former student Jason Parham developed a notion of identifiable called a "census annotation". There is an existing model in Wild Me's Wildbook computer vision algorithm suite (WBIA) that provide a census annotation (CA) score. We could think about how to drive the drone toward improving the CA score. From the videos, we could pick out annotations that are local maxima of the CA score as representative for ID and then aggregate. Or we could try to aggregate in a more continuous fashion.

nfshoobs commented 11 months ago

I'm bringing an image dataset (about 8,000 images) that contain two different angle views of about a half a million specimens of North American freshwater bivalve shells. Freshwater bivalves are the most endangered animals on the planet, and many of these species have suffered serious population level declines and range contractions over the course of the last century. The OSU Museum of Biological Diversity Mollusk Division houses the largest freshwater bivalve collection in the world, and furthermore contains about a quarter of all known museum specimens of endangered, threatened, and extinct species. We have specimens not only from the majority of watersheds in North America, but in many cases from the same sites collected at multiple different time periods. This makes the OSUM Mollusk Division's collection a very powerful resource to ask questions about continental-scale changes in phenotype correlated with anthropogenic disturbance (dams, pollution) and climate change.

The dataset consists of images of whole drawers of specimens from two angles -- top down, and 45º. The drawers contain individual boxes of specimens called "lots". 1 lot is the set of all the specimens of a species collected at a single place and time. All lots in the collection have a unique numeric catalogue number which is printed on the top right corner of a cardstock label in the box. All images were taken using the same lighting setup and contain an Calibrite ColorChecker Nano and a QP Card QP101 Calibration Card with mm scale bar.

My goal is to get help to use CV / ML methods to:

  1. segment both images of each drawer of specimens into lots.
  2. Use OCR to capture the catalogue number of each lot from its label and add the number to the image metadata
  3. assign GUIDs to the images and make the dataset available online for use for morphological analysis.

I would definitely be interested in testing some hypotheses about the distribution of different morphological traits and color patterns using this dataset. It would be the largest dataset of its kind in existence for mollusks. Please reach out if you're interested in collaborating on some or all of this! -Nate

obrookes commented 11 months ago

@DiamondKMG Thanks alot for your interest. I work with the GEO Indigenous Alliance and I'm involved in a planned project that seeks to enhance the Samburu Community's resilience to disaster risks associated with famine and draught and reducing human-wildlife conflicts. The Samburu are pastrolist community which live in the semi-arid Northern part Kenya. Traditionally the community has relied on Indigenous knowledge of moving stars, the moon and insitu observations of behaviour of animals and birds to predict the occurrence of potential hazards. The idea therefore is to incorporate this indigenous insights to build a culturally acceptable solution. There's a plan to install camera traps in strategic locations to capture certain animal behaviours. The project is called "Lopa" which means The "Moon" Project. I would be glad to share with you alot more and will truly appreciate your help.

@douglasmbura sounds like an amazing project, I'd love to know more. What behaviours will you be focusing on (and for which species)?

obrookes commented 11 months ago

Hi! I have access to a video dataset containing annotated behaviors of zebras and giraffes. I was part of the team that collected this dataset at the Mpala Research Center in Kenya last January. A few questions I would like to answer with this dataset:

  1. Can we individually identify zebras and giraffes from videos? WildMe uses computer vision techniques to individually identify zebras and giraffes to identify individuals from photos. Can we build a pipeline to do the same thing from videos?
  2. Can we combine the annotated video frames with the flight telemetry data to determine optimal angle, speed, etc. to improve the quality of the video data? Flight telemetry data includes timestamps, the drone's altitude, velocity in the x, y, and z axis, latitude and longitude, etc. Several videos we collected in Kenya were not usable for behavior annotation due to poor quality, so would like to do an analysis of our telemetry data to see if there are flight techniques we can implement to improve the quality of the video data in the future.

I want to add some ideas regarding the drone videos. The dataset contains annotations for the behavior of zebras and giraffes. We have already tested the dataset against several known action recognition models, but it is still interesting how much we can improve the classification results compared to our baseline. The dataset has labels for 3 species: Grevy's zebras, plains zebras, and giraffes. A few questions we can answer during Datapalooza:

  1. Can we identify species by behavior alone?
  2. Is it possible to determine the sex of animals by behavior?
  3. Are there distinct behavioral traits that can help differentiate between subspecies of zebras (Grevy's zebras vs. plains zebras) or different age groups of giraffes?
  4. How does the altitude and speed of the drone impact the quality and accuracy of behavior annotations?
  5. Can we analyze how behavior changes in relation to geographical features, vegetation density, or other environmental factors visible in the drone footage?

I really like all these questions. Is there somewhere we can learn more about the dataset and view the established benchmarks?