biigle / maia

:m: BIIGLE module for the Machine Learning Assisted Image Annotation method
GNU General Public License v3.0
2 stars 3 forks source link

Use image retrieval techniques to find similiar images #27

Closed dlangenk closed 9 months ago

dlangenk commented 5 years ago

More like a nice to have.

I just browsed through the results of novelty detection. Unfortunately the classes are quite scattered, so that selection takes some time. In addition, some classes are much more abundant than others, so the rare classes might be "lost" in the downstream steps. It would be nice to have a "show me more thumbnails that look like this one" mechanism. Algorithms for that are available in image retrieval. We could for example use mpeg7 features or something similar to create a tree structure from the data to make it easier browsable. Creation of that structure shouldn't take much time or resources.

mzur commented 3 years ago

66 should be implemented first.

mzur commented 3 years ago

Idea for the UI: If this feature is active (which is optional or disabled if not enough training data is available), the grid of image patches in MAIA is split vertically (e.g. 80% rows showing the regular patches, 20% rows showing patches suggested by this method). This way the original MAIA workflow is still possible even if this method performs poorly for a given use case.

mzur commented 2 years ago

This can be done with the image features and similarity search implemented for biigle/core#336. The function should be available for training proposals and annotation candidates.

mzur commented 2 years ago

Next idea for the UI: The selected proposal/candidate is shown, fixed and highlighted at the first position in the grid. The remaining grid items are sorted according to the similarity to the patch. They scroll and can be interacted with as usual. The filtering can be enabled with a hover button on each patch. It can be disabled with a button on the highlighted fixed patch.

mzur commented 2 years ago

Updated the title to make clear that this should be implemented both for training proposals and annotation candidates.

mzur commented 1 year ago

With the student experiments based on Dino features and #96 done, this can move forward now.

mzur commented 9 months ago

I want to pick this up again. New thoughts:

Here is a notebook with a minimal feature-extraction example with DINOv2: https://colab.research.google.com/drive/1LbtYkzdOezl2SadyxCRJFYhLd_aQNjlq?usp=sharing

mzur commented 9 months ago

Thinking about it, maybe I prefer decoupling the vector database from our main database. With MAIA and Largo it's easy to implement cleanup of vector database rows, since the annotation/candidate/proposal patch files are also cleaned. Cleanup can be asynchronous as well.

This has the advantage that the vector DB does not have an impact on the regular DB backups. It can have it's own (less frequent) backups and be run on a different host.

Laravel can work with different database connections (also for migrations). We only need to sync (and index) the model IDs from the regular DB to the vector DB but this shouldn't be a problem.

I'll still stick with pgvector, as I don't want to introduce a new technology to the stack.