dwyl / image-classifier

🖼️ Classify images and extract data from or describe their contents using machine learning
GNU General Public License v2.0
18 stars 3 forks source link

Feat: Comparing Pre-trained Image Classification Models #12

Closed nelsonic closed 8 months ago

nelsonic commented 10 months ago

@LuchoTurtle as you've noted in the README.md > What about other models? section:

image

The bigger the model the more resources consumed and slower the result ... 💰 ⏳ This is your opportunity to do some actual Software Engineering and write-up the findings!

Todo

Each row in the Detail table should be an entry for a given model. Cluster the results together e.g. the Cat/Kitten pick for each model should be together to facilitate comparison.

[!NOTE] Looks like https://huggingface.co/Salesforce/blip-image-captioning-base/tree/main was last updated 11 months ago ...

blip-image-captioning-base-last-updated-11-months-ago

Can we try https://github.com/salesforce/LAVIS ? 💡

nelsonic commented 10 months ago

@LuchoTurtle you asked on Standup if we should compare "just" these 3 models. I think a "small", "medium" and "large" is a good starting point. But if we get feedback from people on HN (once you post the link 😉) that they want more models compared, then more can easily be added.

LuchoTurtle commented 10 months ago

While it's true that there isn't a de facto leaderboard for image captioning (part of computer vision) tasks like MTEB, there's a reason for it.

From what I've seen, the most regarded benchmark comparison there that puts different models side to side is https://paperswithcode.com/sota/image-classification-on-imagenet

It doesn't, however, have multimodal models (models that can receive multiple types of input), which BLIP is. I can try to get a small benchmark going but I'm afraid I don't know how I can make it "data sciency" and compare accuracy between the models you've suggested.

There are already tools that compare different one-shot models, like https://huggingface.co/spaces/nielsr/comparing-captioning-models.

What I'm thinking is :

I'll see to it 👌

nelsonic commented 10 months ago

The only thing we want is a real-world comparison. i.e. We wanted to use an existing model to classify images. We compared these 3 models along 3 dimensions: Quality, Speed & Cost. This is way more interesting to a decision maker than the synthetic benchmark/leaderboard. The Massive Text Embedding Benchmark (MTEB) Leaderboard is interesting for Embeddings ...

image

But your average person has no clue what all the columns in the tables mean. Is a bigger number better or worse? in some cases the "best" model has a worse score than others. How is the ranking calculated?

Anyway, we just want to compare the models that are available to us for the purposes of classifying images. The table will be useful to us and interesting to several thousand other people on HN. 👍

ndrean commented 10 months ago

Just again my few euro, but I already tried microsoft and facebook and found the results bad, when compared to Saleforce/Blip.

microsoft Screenshot 2023-11-15 at 12 32 42

salesforce/blip-base Screenshot 2023-11-15 at 12 36 26

nelsonic commented 10 months ago

Much more descriptive:

image

But app still takes a very long time to load ... ⏳

LuchoTurtle commented 10 months ago

@nelsonic do you mean to load or to get a description? If it takes time to load, it's probably because the machine was "asleep" and you had to boot it again/"wake it" (because we've set machine instances to sleep after a period of inactivity to save costs). This is very much normal. I've inclusively just opened the link and it loaded instantly.

If there's a problem with the time to load the app from a machine that's asleep, that's another issue entirely. Even then, by caching the models, it takes seconds tops, instead of minutes that would have to wasted by re-downloading the models on every app startup.

nelsonic commented 10 months ago

Yeah, agree that Fly.io machine wake time is a separate issue that isn't really under our control. You've done a good job of caching the model. 👍 We just need to trigger the "wake from sleep" when someone views the README.md as noted in: #11

Meanwhile the descriptions are much better!

image
ndrean commented 10 months ago

I suppose you know you can set min_machines_running = 1 in the "fly.toml", depends if you want this.

nelsonic commented 10 months ago

Yeah, when we “productionise” this feature, we will set it to be “always on” (min=1) but for now we just want to focus on cold startup time. 👌