Imageomics / BeetlePalooza-2024

BeetlePalooza collaborative hands-on development event to be held at OSU August 12-15
Creative Commons Zero v1.0 Universal
3 stars 0 forks source link

Brainstorming pipelines to achieve ground beetle identification using AI/ML models #18

Open iaviney opened 2 months ago

iaviney commented 2 months ago

Hi all,

In this issue I am proposing a potential pipeline for ground beetle identification. At the end of the pipeline, I propose a plan for analyzing the image data in the Beetlepalooza 2024 dataset. Below that, I pose some questions and suggest some potential figures/supplementary data that we could make for a pipeline like this one. I do not have experience with AI/ML models, so please feel free to correct my interpretation of the steps, add steps, suggest improvements, etc. I'd appreciate any advice or corrections. I found most of the resources below from this document.

=================================================

Steps to develop a pipeline that takes beetle samples from imaging to species identification:

1. Take images of beetles

2. Associate a species ID with each image

3. Curate the image data to include markers, segmentation, and/or morphological annotations

4. Train the model to predict annotation location, shape, size, or other desired details using the annotated training dataset and test the model using the test dataset.

5. Use the predicted annotations to associate beetle images with a taxon ID

=================================================

Following this pipeline using the Beetlepalooza 2024 dataset:

  1. Use individual beetle image Beetlepalooza NEON dataset as input
  2. Species ID is already identified and associated with images
  3. Use elytra measurements as landmarks -- I'm not sure if we can use lines as landmarks, as EB-Net and ML-morph use dots. But I believe we could possibly replace the ends of the lines with dots using position data?
  4. Train and test ML model
  5. Obtain beetle taxon IDs for each image from ML model landmark prediction (again, I'm not sure how this part works)

=================================================

Some questions:

=================================================

If we were able to create a model, here are some pieces of data that I would be interested in seeing:

JCGiron commented 2 months ago

I fully support this plan. Some caveats for genus/species ID's will be covered by @EvanWaite in the What It Takes to Identify a Beetle bootcamp, especially related to identifying from photos alone and/or dorsal views only.

iaviney commented 2 months ago

Awesome, thanks @JCGiron. I'm looking forward to that bootcamp! I'll definitely attend to get a better idea of the potential for individual photos to provide enough information in the ID process.

sydnerecord commented 2 months ago

Great ideas in this pipeline workflow @iaviney! We will want to keep in mind the limitations of the 2018-NEON-Beetle dataset (e.g., dorsal view only) we have on hand for the workshop and consider what future imaging efforts should include (e.g., ventral, all angles of body, etc.).