dwyl / image-classifier

πŸ–ΌοΈ Classify images and extract data from or describe their contents using machine learning
GNU General Public License v2.0
18 stars 3 forks source link

EPIC: Prepare the repo for `public` share on `HN` πŸš€ #22

Open nelsonic opened 9 months ago

nelsonic commented 9 months ago

@LuchoTurtle you've done a superb job of building a fully functional image captioning app! 😍 πŸŽ‰ Now it's time to get some credit for it πŸ’³ πŸ˜‰ by submitting a "Show HN" :shipit: And in the process raise your public profile for applying to #NextAdventure πŸš€

Todo

Currently it's: https://github.com/dwyl/image-classifier/tree/a3c8c3bf79a0f5e7bca160b6e460d883bc6e3973#why-

image

This is good for an internally-focussed tutorial for us ... πŸ‘ but as an LLM-curious person casually reading on HN, πŸ‘€ this isn't going to "hook" me into reading a 5k word tutorial for 30+ mins ... ⏳

# Why? πŸ’­ 

We needed a fully-offline capable (no 3rd party APIs/Services) image captioning service 
using a state-of-the-art pre-trained image model to describe images uploaded in our 
[`App`](https://github.com/dwyl/app).

# What?

A step-by-step tutorial building a fully functional 
`Phoenix LiveView` web application that allows anyone 
to upload an image and have it described 
by the `Open Source` `BLIP` image captioning (`Large`) model.
intro-gif-position

Part 2

nelsonic commented 8 months ago

@LuchoTurtle how close do you feel this repo is to sharing on HN? πŸ’­ πŸš€

LuchoTurtle commented 8 months ago

Safe some changes to the README (as outlined on this issue), I think it brings sufficient value for those that want to get started with Bumblebee (unfortunately there aren't many examples out there with in-depth guides and comparisons).

However, I think #18 would also bring great value to this project and would be extremely interesting as it would use a part of Bumblebee that is not used here - voice-to-text.

ndrean commented 8 months ago

I can push several versions if you want help. 1) add an audio capture (an HTML

LuchoTurtle commented 8 months ago

@ndrean any PR is helpful :). I think your idea of speech-to-text really takes this to another level. The purpose is to document the process, which is severely lacking in fly.io articles and in the bumblebee repo.

ndrean commented 8 months ago

@LuchoTurtle I understand your idea. Note that the key point of selecting the right model is probably the most difficult part. However, I did no effort on this. I just followed and adapted an article of Sean Moriarty on semantic search with ExFauss and adapted it. . It just works, and I feel I did not really learnt something - except the Elixir point of view if using tasks - in the sense that if you ask for something more difficult, like say an interactive LLM or build a model, then I have clearly no clue what to do. It's more like a one shot. But you have to start somewhere don't you!? Anyway, I always cite my sources and will push things as soon as my computer is available.