nelsonic commented 3 months ago

As noted by @LuchoTurtle in https://github.com/dwyl/image-classifier/issues/97#issuecomment-2038195115 💬 This "hobby" app is costing us considerably more money than we originally expected. 📈 The most recent invoice on Fly.io was $48.61 for Mar 1 - Apr 1, 2024 https://fly.io/dashboard/dwyl-img-class/billing

img-classifer-fly-invoice-march-2024

The current month (April 2024) Amount Due is already $14.34 and we're only on the 4^th!!

img-classifier-fly-invoice-april-projected

If we extrapolate the total will be 7.5 x ($14.34 - $5) + $5 = $75 💸 🔥

This is already more than we spend on our Internet & Phone bill ... 🤯 If the cost could be kept to $10/month it would be fine. 👌

Todo

[x] @LuchoTurtle please have a think about what can be done to reduce or cap the cost. ✂️

I'm keen to keep this app available for people to test without having to run it on localhost. 💻 But if the casual visitor is costing us this kind of cash, imagine if this got to the top of HN! 😬

ndrean commented 3 months ago

Not sure to understand these costs: Fly.io pricing).

1) 19760s = 5h29min20s => is this the uptime?

2) Every time you start the app, you need to upload 2G of data of the models (volume is pruned with the VM), see Fly.io docs). This means you recreate a volume of 2G and load 2G into memory. Is this the meaning of the 2 highlighted lines?

LuchoTurtle commented 3 months ago

@ndrean the models are not pruned with the VM every time the app is started (it was a misconception that was fixed in #82). Currently, the models are being saved in the volume and they are not re-downloaded every time it restarts. You can see it in the logs, actually:

2024-04-05T05:04:33.414 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: Salesforce/blip-image-captioning-base

2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: openai/whisper-small

2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.415 [info] ℹ️ No download needed: sentence-transformers/paraphrase-MiniLM-L6-v2

The problem is that models take up a fair amount of space (Salesforce/blip-image-captioning-base, especially). So we're basically paying for every additional space the models take over the free tier.

However, the cost is definitely bigger with the RAM usage. There's no way around this, as it's used to run inference on the images being uploaded. Although GPUs are much better at this, the costs are severely higher.

@nelsonic unfortunately there's no way around this. There's been people using the application, which is great. But, as with any LLM/ML-based application, it's hard to make it in any free-tier cloud solution without putting the money in.

The app has already been optimized to reduce costs (inbound/outbound data reduced with persistent storages, reducing file limit of images by optimizing it before feeding into the model, reducing Hz of audio file on the client-side before feeding it into the model).

Unfortunately, we have to pull the plug. With increased activity, this "hobbyist" project sucks money (even more so when they're stopped, as shown in your image).

I'd love to have this online for any person to check it out. But it's not feasible :(

I'm going to shut down the machine now. I'll keep the database, though. It has the images and index files saved. So we can still have the uploaded data and have the app running normally by just spawning a new machine whenever we want to. The machine will look for the index file in the database (since it doesn't have any on its own filesystem), download the models and the index file and it will resume where it stopped gracefully :)

ndrean commented 3 months ago

@LuchoTurtle Many thanks for your explanations.

I admit I don't understand the Fly.io docs ....🤔

LuchoTurtle commented 3 months ago

@ndrean the documentation are correct. The application's filesystem is wiped whenever they restart. That's why they offer volumes to place persistent data (data we want to keep in-between restarts). Currently, the models are inside one of these volumes, hence why they don't need to be downloaded.

What #82 did was fix the path of the volume inside Fly.io, which was previously incorrect.

LuchoTurtle commented 3 months ago

I've deleted the machine. We can spawn a new one whenever we want to. I'm keeping this issue open for other people to see it as a reference of how much it costs to run this on fly.io (without GPU!)

ndrean commented 3 months ago

You opted for the machine below , didn't you? Where is it in the bill?

#fly.toml
[[vm]]
  size = 'performance-4x'

LuchoTurtle commented 3 months ago

I don't think they differentiate it on the billing page, unfortunately.

According to https://fly.io/docs/about/billing/#machine-billing:

Started Machines are billed per second that they’re running (the time they spend in the started state), based on the price of a named CPU/RAM combination, plus the price of any additional RAM you specify.

For example, a Machine described in your dashboard as “shared-1x-cpu@1024MB” is the “shared-cpu-1x” Machine size preset, which comes with 256MB RAM, plus additional RAM (1024MB - 256MB = 768MB). For pricing and available CPU/RAM combinations, see Compute pricing.

So they bill based on the preset per second it is used + any additional RAM we specify. Because the machine wasn't always being used, we didn't pay the 124 dollars that you showed in the picture.

nelsonic commented 2 months ago

@LuchoTurtle didn't want you to DELETE the machine ... 🙃 Just wanted it to run more efficiently ... 💭 But if that is going to take too much time, fair enough. 👌

nelsonic commented 2 months ago

@LuchoTurtle quick question: (though probably a rabbit hole…)

Do you think we could run the image Classifier App on a Server (custom build on-prem machine) @home connected to our main App via Back Channel such that:

A. Person uploads the image to AWS S3 B. This triggers a request to the AI BOX to classify it C. AI BOX classifies the image and returns its guess

How much would it cost to build a machine that can do basic inference?

This would mean that our only marginal cost would be electricity and no surprise bills when it gets to the top of HN.

asking cause if we could put together a decent machine for ~€600 including an NVIDIA GeForce RTX 4060 EAGLE with 8 GB GDDR6: https://amzn.eu/d/5fr9N5J

This could serve our needs quite well and we could run other models on it without ever having to worry about boot times etc.

Thoughts? 💭

nelsonic commented 2 months ago

Though we probably have to spend a decent chunk on the GPU ... https://www.reddit.com/r/MachineLearning/comments/17x8kup/d_best_value_gpu_for_running_ai_models/

We will use it for a few tasks so I think it's worth investigating. 💭

LuchoTurtle commented 2 months ago

Currently, this project targets the CPU (by default, since running on GPU entails having specific drivers according to the hardware). To run on GPUs, I think we only need to change a few env variables (https://github.com/elixir-nx/xla#usage) but further testing may be necessary.

Regarding which GPU to choose, I can't really provide an informed decision. I know vRAM is quite important.

Of course, I'm not expecting you to get a H100, that's wayyy too overkill. But it seems that the 3090 seems like a good compromise and a performance-to-cost ratio.

I'd hold on purchasing anything yet, though. It needs to be confirmed that inference can be run on the GPU with Bumblebee before making any purchases that can be rather costly :)

nelsonic commented 2 months ago

Ok. Thanks for your reply. Seems like this will require some further thought. What do we need to do next? 💭

LuchoTurtle commented 2 months ago

I'd need to check running locally on the GPU to see if it works. Since I've a 1080, it's CUDA, so it should work with a 3090, theoretically. I just need to know it actually uses the GPU first :)

nelsonic commented 2 months ago

https://blog.themvp.in/hardware-requirements-for-machine-learning/

Used - like new - the 3090 with 24GB VRAM costs ~£650: https://www.ebay.co.uk/itm/176345660501

This is certainly more than we were spending on Fly.io but if it means we can do more with Machine Learning with a baseline load I think it's worth it. 💭

ndrean commented 2 months ago

My 2ç input.

If Whisper (Speech-to-Text) is the sink or bottleneck, can a cloud service be considered? https://elevenlabs.io/docs/introduction seems to offer WS connection to stream down the response.

Did not see figures on pricing.

nelsonic commented 2 months ago

@ndrean your insight is always welcome. ❤️ Yeah, the speech part shouldn’t be the bottleneck, 🤔 and importantly the purpose of building our own project Instead of using an API (or Google Lens) for image classification was to not send personal data to a 3rd party. 💭

We want to be 100% certain that an image we classify is not being used for any other purpose. Same goes for voice recordings. Ref: https://github.com/dwyl/video/issues/91 While I might be OK with making a recording of my voice public, I know people who wouldn’t do it because they are way more privacy conscious.

ndrean commented 2 months ago

Fair point. If you open your machine and offer such a service, how do you guarantee the user's privacy? I mean, you store images on S3 - publicly available - and run a local database. What is your architecture? The HTTPS termination would be a reverse proxy, so that any app served by your machine is routed to as a sub domain? Also, is a simple declaration of intention enough? Something like "we don't store your data nor transmit them to any external service of any kind"?

nelsonic commented 2 months ago

It really depends on if we want to make the service public or just for people using our App. For people using our App they know we aren’t using their images for “training” and also won’t leak them. But we don’t have advanced access controls on images yet beyond restricting access to just the person that uploaded them. Ideally once we have the “groups” feature, it will be easy to restrict access. But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying. 💭

ndrean commented 2 months ago

If you use the app as such, all images are saved altogether in your public bucket, and the corresponding URL is saved in a database, meaning a semantic search can deliver any image approximating your query, yours or not.

A simple login and the addition of the user_id in the image schema can overcome images from becoming publicly available, at least through the semantic search. But when you receive a response, you receive an URL to display the image. Doesn't the URL display the bucket origin name? Since the bucket is public, can't you exploit this?

But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying.

Instead of an S3 URL, you use a path on the Filesystem. If you erase the uploaded paths after a search, you can run a search only once.

nelsonic commented 2 months ago

Predictably, someone has already setup a biz around renting GPU time: https://www.gpudeploy.com/connect Via: https://news.ycombinator.com/item?id=40260259

dwyl / image-classifier

How Much Does it Cost? 💸 😬 #98

Todo