Closed ndrean closed 6 months ago
Another trial today:
A cold start:
Connected to 66.241.125.29:443 from 192.168.1.5:53615
HTTP/2 502
server: Fly/f9c163a6 (2024-01-16)
via: 2 fly.io
fly-request-id: 01HMKTKPQBGN3XRXEY95MKF2W5-bog
date: Sat, 20 Jan 2024 16:19:17 GMT
Body stored in: /var/folders/mz/91hbds1j23125yksdf67dcgm0000gn/T/tmp893dwse9
DNS Lookup TCP Connection TLS Handshake Server Processing Content Transfer
[ 45ms | 25ms | 203ms | 98992ms | 0ms ]
| | | | |
namelookup:45ms | | | |
connect:70ms | | |
pretransfer:273ms | |
starttransfer:99265ms |
total:99265ms
Seems that nothing is persisted on disk and that you have to download everything? The "warm start" (rerun the command after the first return) gives totally acceptable results:
DNS Lookup TCP Connection TLS Handshake Server Processing Content Transfer
[ 1ms | 27ms | 29ms | 485ms | 1ms ]
| | | | |
namelookup:1ms | | | |
connect:28ms | | |
pretransfer:57ms | |
starttransfer:542ms |
total:543ms
@LuchoTurtle Some thoughts I suppose you already went through. Is Fly pruning the Docker images? And what if you use a Fly volume and reference it as a (persistent) Docker volume? It would be populated once and for all.
You run the upload the image model in "Application.ex" so I also need to upload the whisper model.
I tried to load in parallel the models but for some reason this does not give any speed.
#Application.ex
@models_folder_path Application.compile_env!(:app, :models_cache_dir)
@captioning_prod_model %ModelInfo{
name: "Salesforce/blip-image-captioning-base",
cache_path: Path.join(@models_folder_path, "blip-image-captioning-base"),
load_featurizer: true,
load_tokenizer: true,
load_generation_config: true
}
@whisper_model %ModelInfo{
name: "openai/whisper-small",
cache_path: Path.join(@models_folder_path, "whisper-small"),
load_featurizer: true,
load_tokenizer: true,
load_generation_config: true
}
def start(_type, _args) do
[
@whisper_model,
@captioning_prod_model,
@captioning_test_model
]
|> Enum.each(&App.Models.verify_and_download_models/1)
# this "async upoad" isn't faster ???
#|> Task.async_stream(&App.Models.verify_and_download_models/1), timeout: :infinity)
#|> Enum.to_list()
[...]
I've documented everything in https://github.com/dwyl/image-classifier/blob/main/deployment.md regarding deployment to fly.io.
I'm indeed using a volume to store the models and last time I deployed, everything seemed to be working. The models were downloaded after deploying and being used first and then used in subsequent runs. In fact, because the way the models are being served with offline: true
, it's programmatically enforced that the models have to be used locally, or else the app won't run.
As you know, I've gone through this situation of persisting models quite a few times: first by changing the Dockerfile
(as you are aware) and then to the current solution.
I can see your activity on the logs. Here's the volume being mounted:
2024-01-21T20:11:52.780 app[28659e0b5936e8] mad [info] INFO Mounting /dev/vdb at /app/.bumblebee w/ uid: 65534, gid: 65534 and chmod 0755
As you know, when a model is downloaded, a message like [info] Downloading Salesforce/blip-image-captioning-base...
appears. It appears to be the case:
2024-01-21T17:27:02.541 app[28659e0b5936e8] mad [info] 17:27:02.539 [info] Downloading Salesforce/blip-image-captioning-base...
Unless the volumes are actively being pruned when downscaling due to inactivity, I don't understand this behaviour :(
Thank you for sharing httpstat
though, it seems like an awesome tool that I will probably start using :)
Yes indeed, seems that volumes are pruned when the machine is killed.
Maybe we could save these 3 models into a Postgres blob field (large object)? A DB is persisted, and a db_query/copy_if_not_exists should be faster option?
I may try this
That seems like a plausible option (and to be quite frank, the only option we probably have given that we want the machines to scale down with inactivity). It sucks that we have to undergo a "hacky way" to get it to work :(
But, as much as I'd love to do that, I don't think it's pertinent (at least to my/this repo's scenario). Volumes shouldn't be pruned if downscaled :( . The strategy that is documented should work file in most cases, so I don't really feel the need to try to save models within a relational database, it just seems counter-intuitive and may stray beginners to think it's ok, when it's not really suitable for this case.
Although I appreciate your feedback (I really, really, really do), you can try it for yourself if you want. But I don't see myself hacking my around and saving models into a database and all the headache that may come along with it. I'm really excited in actually getting the audio transcription PRs you've implemented and then work from there :D
I am curious but you are wise, so this project does not need this. I may probably lok into this one day. Now from the docs, they seem to discourage this for "big" files, and is limited to 1Gb when of type "bytea" or "text". Note that the models used here contain 900Mb files but "large" models are over 1Gb. One point I don't understand though is why I can't do a // download. I keep this for later and seek for help.
@LuchoTurtle I suppose no one is using the Fly machine yet so the Fly must have been stopped/pruned since last time. Can you check the status of the Fly volumes now to see if there is still there?
Unfortunately, I can't fly ssh console
to the volume to see its contents without initializing a VM (which would prompt the models to be re-downloaded again, according to our theory). The best I can do is checking the size that is being "occupied".
Since the models are usually 1GB, I can assume the volume is being cleaned up :/
I read volumes forks. Could this one be permanent??
The new volume is in the same region, but on a different physical host, and is not attached to a Machine. The new volume and the source volume are independent, and changes to their contents are not synchronized.
I understand Dwyl is a "real" customer, aren't you? Any chance to use this help to use a fork as a backup? Fly may be reactive with "real" customers? 🤔
@LuchoTurtle feel free to invite @ndrean to the org: https://fly.io/dashboard/dwyl-img-class/team to debug this.
I used httpstat to get some stats on a cold start vs a warm start to get an idea on the state of the current app (only Image-To-Text models are loaded).
The first run:
The next run is a "warm" start: