Open dlangenk opened 1 year ago
Nice. How much did you have to tune the thresholds? It's probably harder if you tell it something like Kolga hyalina 😄
This was the default settings. Of course as always some things work some things don't. I let it search for a specific species type which did not really work. So I guess Kolga hyalina should not work. The guy told me he tried with descriptive terms which worked for him in Hausgarten like images.
It's definitely interesting. If it really works with descriptive terms instead of just "class names" then it could be used as an alternative (or as part of) a MAIA-like tool. Users have to describe each label they are looking for and the system can detect them. This way you get automatic classification, too (each label can be detected separately, conflicts/duplicate detections have to be resolved manually).
Here is a small script to try a CLIP based model on any image_id in Biigle and show or save the results. det_zeroShotClip.py.zip
I just read a piece about CLIP and now I know what it does :smile: It's still a nice use case to have text-based detection. Could the other way around also be useful? I'm thinking about the people wanting to create a taxonomic catalog based on morphology. With CLIP you could select an image and CLIP gives you words that describe the image. It may even be a bonus that CLIP can't produce "expert terms" because common terms can be understood by anyone. While these common descriptive terms could be used to describe to humans how they could identify a particular morphotaxon, under the hood, the taxonomic catalog could also store the actual feature vector(s) of the example image(s) so algorithms can also identify the objects based on morphology.
I may be biased but I have the impression that embeddings and vector databases are all the rage right now (ref with a very active discussion on Hacker News). A taxonomic database of automatically generated descriptive words as well as feature vectors for different species may be as useful or even more so than the zoo of trained models that people like FathomNet want to build. I wouldn't be surprised if certain SMarTaR-ID people would like this idea :wink: @timtacle23 maybe we should talk with them about this sometime.
Not sure how well this works out of the box, or how easy the fine tuning process is. But we have a new PhD student who should work in that area, so this would actually be a nice opportunity to generate a proof of concept.
On the conference today one person made me aware of this incredible tool linking language processing and SAM:
https://github.com/luca-medeiros/lang-segment-anything
I tried it with one public domain fish image and it worked very well (see below). You type in what you want to detect and tadaaaa🧙♂️ We should probably test it with more complex things but I guess this would be very nice for Biigle.