fathomnet / community-feedback

0 stars 0 forks source link

morphocluster UI for quality control in FathomNet #17

Open kakanikatija opened 2 years ago

kakanikatija commented 2 years ago

Good to implement for rapid QC of FathomNet submissions: https://arxiv.org/abs/2005.01595

hohonuuli commented 2 years ago

Morphocluster source code lives at https://github.com/morphocluster/morphocluster. @kevinsbarnard was able to get it up and running quickly following the instructions in the repo. The default setup doesn't do a great job of clustering for our images, maybe use one of our models (Just strip out the classifier labels). @moi90 recommended keeping the features to 32 dimensions, more than that runs sloooooowly.

Can we integrate this into gridview to group like images near each other?

hohonuuli commented 2 years ago

Note: That morphocluster only handles one dataset at a time. Usage currently requires spinning up a new instance for each dataset.

hohonuuli commented 2 years ago

TODO: Run tests with Morphocluster using our own feature extractor

hohonuuli commented 2 years ago

50k images is a low end for Morphocluster, it's more optimized for sets on the order of 5 million.

moi90 commented 2 years ago

Regarding the clustering of the color images:

hohonuuli commented 2 years ago

@kevinsbarnard ^^^^ When you get a minute would you try @moi90 suggestion? Thanks!

kevinsbarnard commented 2 years ago

@hohonuuli Just ran clustering with a min cluster size of 128, 64, 32, 16, and 8 on the same set of 4758 midwater ROIs from before. I'm not sure how to quantify it, but at a glance the initial clusters are looking more homogeneous, and of course there are a lot more of them to validate. I'd be curious to try it out on benthic images as well.

Screenshot_20220126_120453

lonnylundsten commented 2 years ago

Cool. It looks good to me. I'd love to see it on some of the benthic sets.

kevinsbarnard commented 2 years ago

@lonnylundsten @hohonuuli I generated a 2-class benthic set (via m3-download) consisting of 659 ROIs total: a mix of Chionoecetes tanneri and Paragorgia arborea (picked because a non-expert like myself can still discern). With a min cluster size of 32, three clusters were generated, which look promising already: Screenshot_20220126_134712 Screenshot_20220126_134759 Screenshot_20220126_134828 It took me about 10 minutes to do the validate and grow steps for the first iteration. For one cluster, the suggestions were evenly mixed, and for the other two they were nearly perfect.

lonnylundsten commented 2 years ago

Looks good, Kevin. So that’s just the morphocluster code? That’s not using the feature extractor from the yolo model? I’d like to have you walk me through the interface because It’s not super clear, for example if a cluster contains a mix of classes if there mechanism to remove a bad class from a cluster and move it to the proper cluster? I do see how this can be useful but I’m curious about that human interaction.

kevinsbarnard commented 2 years ago

@lonnylundsten Yes, these were generated using the feature extractor provided with the morphocluster repository. In the validation stage (shown in those screenshots), you can kick an ROI out of the cluster by pressing the up arrow at the top right (moves it into the parent node).

moi90 commented 2 years ago

@lonnylundsten In order to maximize efficiency, the preferred way is not to resort the objects of inhomogeneous clusters, but to delete the cluster altogether. The objects of the inhomogeneous clusters will eventually reappear as clusters in a later iteration of the process. Or maybe the correct cluster for these objects already exists and the objects are pulled in during growing.

One tweak that might make sense to implement: If a cluster is clearly two categories, it could be split automatically. The resulting clusters would then be validated as usual.

hohonuuli commented 1 year ago

@kakanikatija suggests adding a gridview-like tool to FathomNet for rapid assesment/correction.

Jordan-Pierce commented 1 year ago

This is awesome! Have you looked at other unsupervised/self-supervised techniques other than MorphCluster (which also looks dope)?

SimCLR is currently one of the SOTA methods.

moi90 commented 1 year ago

Self-Supervised feature extractor training is certainly the way to go for new datasets and will increase MorphoCluster's efficiency. But in itself, it doesn't help you with annotating ;)