Open nelsonic opened 3 months ago
Not sure to understand these costs: Fly.io pricing).
1) 19760s = 5h29min20s => is this the uptime?
2) Every time you start the app, you need to upload 2G of data of the models (volume is pruned with the VM), see Fly.io docs). This means you recreate a volume of 2G and load 2G into memory. Is this the meaning of the 2 highlighted lines?
@ndrean the models are not pruned with the VM every time the app is started (it was a misconception that was fixed in #82). Currently, the models are being saved in the volume and they are not re-downloaded every time it restarts. You can see it in the logs, actually:
2024-04-05T05:04:33.414 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: Salesforce/blip-image-captioning-base
2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.414 [info] ℹ️ No download needed: openai/whisper-small
2024-04-05T05:04:33.415 app[080e325c904168] mad [info] 05:04:33.415 [info] ℹ️ No download needed: sentence-transformers/paraphrase-MiniLM-L6-v2
The problem is that models take up a fair amount of space (Salesforce/blip-image-captioning-base
, especially). So we're basically paying for every additional space the models take over the free tier.
However, the cost is definitely bigger with the RAM usage. There's no way around this, as it's used to run inference on the images being uploaded. Although GPUs are much better at this, the costs are severely higher.
@nelsonic unfortunately there's no way around this. There's been people using the application, which is great. But, as with any LLM/ML-based application, it's hard to make it in any free-tier cloud solution without putting the money in.
The app has already been optimized to reduce costs (inbound/outbound data reduced with persistent storages, reducing file limit of images by optimizing it before feeding into the model, reducing Hz of audio file on the client-side before feeding it into the model).
Unfortunately, we have to pull the plug. With increased activity, this "hobbyist" project sucks money (even more so when they're stopped, as shown in your image).
I'd love to have this online for any person to check it out. But it's not feasible :(
I'm going to shut down the machine now. I'll keep the database, though. It has the images and index files saved. So we can still have the uploaded data and have the app running normally by just spawning a new machine whenever we want to. The machine will look for the index file in the database (since it doesn't have any on its own filesystem), download the models and the index file and it will resume where it stopped gracefully :)
@ndrean the documentation are correct. The application's filesystem is wiped whenever they restart. That's why they offer volumes to place persistent data (data we want to keep in-between restarts). Currently, the models are inside one of these volumes, hence why they don't need to be downloaded.
What #82 did was fix the path of the volume inside Fly.io, which was previously incorrect.
I've deleted the machine. We can spawn a new one whenever we want to.
I'm keeping this issue open for other people to see it as a reference of how much it costs to run this on fly.io
(without GPU!)
You opted for the machine below , didn't you? Where is it in the bill?
#fly.toml
[[vm]]
size = 'performance-4x'
I don't think they differentiate it on the billing page, unfortunately.
According to https://fly.io/docs/about/billing/#machine-billing:
Started Machines are billed per second that they’re running (the time they spend in the started state), based on the price of a named CPU/RAM combination, plus the price of any additional RAM you specify.
For example, a Machine described in your dashboard as “shared-1x-cpu@1024MB” is the “shared-cpu-1x” Machine size preset, which comes with 256MB RAM, plus additional RAM (1024MB - 256MB = 768MB). For pricing and available CPU/RAM combinations, see Compute pricing.
So they bill based on the preset per second it is used + any additional RAM we specify. Because the machine wasn't always being used, we didn't pay the 124 dollars that you showed in the picture.
@LuchoTurtle didn't want you to DELETE
the machine ... 🙃
Just wanted it to run more efficiently ... 💭
But if that is going to take too much time, fair enough. 👌
@LuchoTurtle quick question: (though probably a rabbit hole…)
A. Person uploads the image to AWS S3 B. This triggers a request to the AI BOX to classify it C. AI BOX classifies the image and returns its guess
This would mean that our only marginal cost would be electricity and no surprise bills when it gets to the top of HN.
asking cause if we could put together a decent machine for ~€600
including an NVIDIA GeForce RTX 4060 EAGLE with 8 GB GDDR6:
https://amzn.eu/d/5fr9N5J
This could serve our needs quite well and we could run other models on it without ever having to worry about boot times etc.
Thoughts? 💭
Though we probably have to spend a decent chunk on the GPU ... https://www.reddit.com/r/MachineLearning/comments/17x8kup/d_best_value_gpu_for_running_ai_models/
We will use it for a few tasks so I think it's worth investigating. 💭
Currently, this project targets the CPU (by default, since running on GPU entails having specific drivers according to the hardware). To run on GPUs, I think we only need to change a few env variables (https://github.com/elixir-nx/xla#usage) but further testing may be necessary.
Regarding which GPU to choose, I can't really provide an informed decision. I know vRAM is quite important.
Of course, I'm not expecting you to get a H100, that's wayyy too overkill. But it seems that the 3090 seems like a good compromise and a performance-to-cost ratio.
I'd hold on purchasing anything yet, though. It needs to be confirmed that inference can be run on the GPU with Bumblebee before making any purchases that can be rather costly :)
Ok. Thanks for your reply. Seems like this will require some further thought. What do we need to do next? 💭
I'd need to check running locally on the GPU to see if it works. Since I've a 1080, it's CUDA, so it should work with a 3090, theoretically. I just need to know it actually uses the GPU first :)
https://blog.themvp.in/hardware-requirements-for-machine-learning/
Used - like new - the 3090 with 24GB VRAM
costs ~£650
: https://www.ebay.co.uk/itm/176345660501
This is certainly more than we were spending on Fly.io but if it means we can do more with Machine Learning with a baseline load I think it's worth it. 💭
My 2ç input.
If Whisper (Speech-to-Text) is the sink or bottleneck, can a cloud service be considered? https://elevenlabs.io/docs/introduction seems to offer WS connection to stream down the response.
Did not see figures on pricing.
@ndrean your insight is always welcome. ❤️ Yeah, the speech part shouldn’t be the bottleneck, 🤔 and importantly the purpose of building our own project Instead of using an API (or Google Lens) for image classification was to not send personal data to a 3rd party. 💭
We want to be 100% certain that an image we classify is not being used for any other purpose. Same goes for voice recordings. Ref: https://github.com/dwyl/video/issues/91 While I might be OK with making a recording of my voice public, I know people who wouldn’t do it because they are way more privacy conscious.
Fair point. If you open your machine and offer such a service, how do you guarantee the user's privacy? I mean, you store images on S3 - publicly available - and run a local database. What is your architecture? The HTTPS termination would be a reverse proxy, so that any app served by your machine is routed to as a sub domain? Also, is a simple declaration of intention enough? Something like "we don't store your data nor transmit them to any external service of any kind"?
It really depends on if we want to make the service public or just for people using our App. For people using our App they know we aren’t using their images for “training” and also won’t leak them. But we don’t have advanced access controls on images yet beyond restricting access to just the person that uploaded them. Ideally once we have the “groups” feature, it will be easy to restrict access. But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying. 💭
If you use the app as such, all images are saved altogether in your public bucket, and the corresponding URL is saved in a database, meaning a semantic search can deliver any image approximating your query, yours or not.
A simple login and the addition of the user_id in the image schema can overcome images from becoming publicly available, at least through the semantic search. But when you receive a response, you receive an URL to display the image. Doesn't the URL display the bucket origin name? Since the bucket is public, can't you exploit this?
But if we were running the service as a general purpose privacy-first classifier, we’d just store the images in /temp and then delete them after classifying.
Instead of an S3 URL, you use a path on the Filesystem. If you erase the uploaded paths after a search, you can run a search only once.
Predictably, someone has already setup a biz around renting GPU time: https://www.gpudeploy.com/connect Via: https://news.ycombinator.com/item?id=40260259
As noted by @LuchoTurtle in https://github.com/dwyl/image-classifier/issues/97#issuecomment-2038195115 💬 This "hobby" app is costing us considerably more money than we originally expected. 📈 The most recent invoice on Fly.io was
$48.61
for Mar 1 - Apr 1, 2024 https://fly.io/dashboard/dwyl-img-class/billingThe current month (April 2024) Amount Due is already
$14.34
and we're only on the 4th!!If we extrapolate the total will be
7.5 x ($14.34 - $5) + $5 = $75
💸 🔥This is already more than we spend on our Internet & Phone bill ... 🤯 If the cost could be kept to $10/month it would be fine. 👌
Todo
I'm keen to keep this app available for people to test without having to run it on
localhost
. 💻 But if the casual visitor is costing us this kind of cash, imagine if this got to the top ofHN
! 😬