Open ndrean opened 1 year ago
Prediction slows down the process, roughly 1.5s per request. I will try to deploy this horror.
The "GET" endpoint - where you pass in an URL of a pic - works with the query string addition "pred" (no prediction, thus faster is you don't pass one).
curl -X GET http://localhost:4000/api?url=....&w=300&pred=on
The "POST" endpoint - where you submit files from a client via a FormData to the API - also works, but you use a checkbox if you want the prediction (I capture it the same way, via a key "pred", thus there is a constraint on the FormData naming).
For complteness,
https://github.com/elixir-nx/bumblebee/tree/main/examples/phoenix#user-images
it seems that when the image is too small that the findings are not so good. After reading a bit, it seems that sizes around 512x512 are recommended for Image-to-Text. The speed of the recognition is also related to the size of the image, the bigger the longer. To speed up the process, if an image is bigger, I resize to this size and run the ML model on it.
I added the redirection to accept images from "unsplash" for example. If you submit a src="https://source.unsplans.com/Finch.stream
. If it detects a redirection by reading the headers, it takes the received path and makes a recursion. Otherwise, it writes the stream into a file, so the process eventually ends. This way, the body is processed only once, redirection or not, has a low memory footprint and it does not slow down the process.
I changed to Nx.Serving.batched_run/3
as it seems to give faster results when treating several pictures (uploaded as a POST request).
To use batchde_run
, the set up is different. The process Nx.Serving
is launched in the Application
module.
#Application.ex
children = [
...,
{Nx.Serving, serving: serve(), name: UpImg.Serving, batch_size: 10, batch_timeout: 100}
]
defp serve do
model = System.fetch_env!("MODEL")
{:ok, resnet} = Bumblebee.load_model({:hf, model})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, model})
Bumblebee.Vision.image_classification(resnet, featurizer,
defn_options: [compiler: EXLA],
top_k: 1,
compile: [batch_size: 10]
end
and then use instead:
def predict(%Vix.Vips.Image{} = image) do
# serving = UpImg.GsPredict.serve()
{:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
Vix.Vips.Image.write_to_tensor(image)
{width, height, channels} = shape
# bug in Vix.Vips, with HWC and WHC....
t_img = Nx.from_binary(data, type) |> Nx.reshape({height, width, channels}, names: names)
Task.async(fn -> Nx.Serving.batched_run(UpImg.Serving, t_img) end)
# Task.async(fn -> Nx.Serving.run(serving, t_img) end)
end
! One must be careful with the async calls. When you run this async task, say %Task{} = task = predict(image)
, you can only get the result back -Task.await(task)
- from the owner process.
To read an URL and download it with Finch
using streams and potentially accept redirects (and write it into a temp file), you can use the 302 and Location header:
{:ok, path} = Plug.upload.random_file("temp-stream")
{:ok, file} = File.open(path, [:binary, :write])
# url = "https://source.unsplash.com/QT-l619id6w"
request = Finch.build(:get, url)
stream_write(request, path)
File.close!(file)
def stream_write(request, file) do
Finch.stream(UpImg.Finch, nil, fn
{:status, status}, _acc ->
status
{:headers, headers}, status ->
handle_headers(headers, status)
{:data, data}, headers ->
handle_data(file, data, headers)
end)
end
def handle_headers(headers, 302), do:
Enum.find(headers, &(elem(&1, 0) == "location"))
def handle_headers( headers, 200), do: headers
def handle_headers(_,_), do: {:halt, "bad redirection"}
def handle_data(file, _, {"location", location}), do:
Finch.build(:get, location) |> stream_write(file)
def handle_data(_, _, {:halt, "bad redirection"}), do:
{:halt, "bad redirection"}
def handle_data(file, data, _) do
case IO.binwrite(file, data) do
:ok -> :ok
{:error, reason} -> {:halt, reason}
end
end
The memory footprint is low, at the expense of writing the body of the request into a file (but one can just append the chunk in memory if needed).
Thanks for the excellent write-up, @ndrean , it was super insightful!
I haven't tried Bumblebee
but I'm wanting to (when I have more free time). Are you using any specific Hugging Face models in your experiments?
On the topic, you may also find https://github.com/replicate/replicate-elixir as another alternative. Unfortunately, it's tied to their platform, but it might still be fun to tinker with. This is from this AWESOME talk from Charlie Holtz in https://www.youtube.com/watch?v=TfZI5-oQSqI&ab_channel=ElixirConf. It's an awesome video that really highlights how Elixir has great built-in tools to get AI models with LiveView working seamlessly.
Yes. I used microsoft/resnet model.
Thanks for the "replicate" link. I will give it a try too!
@LuchoTurtle Thanks! I really enjoyed watching this video, had a lot of fun! π I just realised that you already wrote plenty of good things on this subject before I woke up! So nothing new under the sun for you... π
https://github.com/dwyl/learn-elixir/issues/212 https://github.com/dwyl/image-classifier/issues/1
I am just looking at Image Classification - namely a weighted list of predictions - whilst you wanted Image-to-text, more ambitious. I just wondered what you would do with the generated text for an image because you need to further process this response to extract some keys points, if this is what you want.
For example, the Salesforce/BLIP is an I2T. I run it in a Livebook, the easiest way to do this. It downloads 1.7Gb... The generated code is:
{:ok, model_info} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, featurizer} =
Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, tokenizer} =
Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, generation_config} =
Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)
serving =
Bumblebee.Vision.image_to_text(model_info, featurizer, tokenizer, generation_config,
compile: [batch_size: 1],
defn_options: [compiler: EXLA]
)
image_input = Kino.Input.image("Image", size: {384, 384})
form = Kino.Control.form([image: image_input], submit: "Run")
frame = Kino.Frame.new()
Kino.listen(form, fn %{data: %{image: image}} ->
if image do
Kino.Frame.render(frame, Kino.Text.new("Running..."))
image =
image.file_ref
|> Kino.Input.file_path()
|> File.read!()
|> Nx.from_binary(:u8)
|> Nx.reshape({image.height, image.width, 3})
%{results: [%{text: text}]} = Nx.Serving.run(serving, image)
Kino.Frame.render(frame, Kino.Text.new(text))
end
end)
Kino.Layout.grid([form, frame], boxed: true, gap: 16)
I generated an image with Stable Diffusion, and submit it to BLIP. The result is pretty good! π, but it is not classification!
This works for me because the model is "deciphered" in some way in Bumblebee. What if I want to use a specific model? That I don't know how to proceed.
I used a small Image Classification model -300Mb downloaded - embeded into the app as this tends to be much smaller. However, the Image Classification is not as good - same image - even for a 1300x1000px image:
Replicate exposes an endpoint. You also need to be careful with the data you submit to get the right balance between speed and accuracy: you might pay too much or pay for nothing if you don't deliver a properly sized image :)
When you read this "official" example, they naturally stress that the navigator should resize pics instead of a/the server. However, In this git repo, the proposed JS code is a bit ... wordy.
So down to earth I tried to follow the repo recommendations - at least for a WebApp version - and I looked at how to do this. In fact, you can get a bunch of resized images from the browser with a Promise.all
because the browser is efficient at doing this: a form accepts an image and you "just" inject it into different resizing canvas that you set, and call canvas.toBlob
. You can target a thumbnail, or a "ML" sized image (512px) or 3 different sizes to match mobile, pad and full screen 1440px pretty quickly for example.
One point is the naming: you need a unique base identifier for all these files. It turns out that JS can produce a SHA1 easily, no library, so I used this as a unique naming base, modulo some size identifier. You can also convert into WEBP just like this, and this saves a lot. You can upload directly to a bucket, and pass down to the server the 512px file to do the ML stuff. The bucket does his stuff and returns a response back to the client - a bunch of URLs - and the client forwards the responses to the server where you update the socket. Meanwhile, the server did the ML stuff to produce a caption/prediction. It remains to save all this into the DB. With the SHA1 naming, we have a common identifier, almost collision free, so we can update the DB record easily. All easy async client-side and server-side. The main difficulty is the "hook".
The shinstagram source. He uses R2 but I did not get the details how he uses the CDN to serve the files.
Thanks for the detailed write-up @ndrean , it is really super insightful! Once I'm cleared with other tasks, I want to give https://github.com/dwyl/image-classifier/issues/1 a whirl and, since you've put much more time and effort than I into this, I might ask ya for some pointers!
I am just looking at Image Classification - namely a weighted list of predictions - whilst you wanted Image-to-text, more ambitious. I just wondered what you would do with the generated text for an image because you need to further process this response to extract some keys points, if this is what you want.
Not necessarily. I actually want a list of keywords that describe an image, just like you want. However, I believe that one may yield fair results by using a combination of an image captioning model like BLIP
and a regular LLM
to better extract keywords. In the same way https://zhaohengyuan1.github.io/image2paragraph.github.io/ uses three models to densely and accurately describe an image, one may do something like:
Use BLIP
to describe the image -> feed into an LLM
to gather relevant keywords with context from the image.
Then again, this is pure speculation on my part.
I generated an image with Stable Diffusion, and submit it to BLIP. The result is pretty good! π, but it is not classification!
Your results are awesome! Though why do you say it's not considered "classification"? Is it because it's not yielding a set of weighted predictions in lieu of a simple phrase?
This works for me because the model is "deciphered" in some way in Bumblebee. What if I want to use a specific model? That I don't know how to proceed.
Apparently, you can't use any HuggingFace
transformer with Bumblebee, which is a shame. I don't know the specifics but, according to https://github.com/elixir-nx/bumblebee#model-support, it "has to be implemented in Bumblebee" (whatever that means).
However you can use https://jonatanklosko-bumblebee-tools.hf.space/apps/repository-inspector/36pihlb7tb7rvmovbnvrmjseud5mzdlxhbfaa6xywewlprok to check if a Transformer model from HuggingFace is supported or not.
So it's fair that you don't know how to use other models from HuggingFace because apparently it's not possible :p.
Image sizes
It's interesting, the deal with image sizes, as you pointed, spans to even image generation with Stable Difusion
. Even when I'm doing img-to-img
or inpainting, I yield much better results with 512px images.
Using multiple canvas
with different sizes, injecting images there is a fun way of getting different-sized images, quite creative!
Thanks for the shinstagram
source, I'll have to take a look at it! :D
Yes, I see, a mix. Sometimes I have good predictions, but more often BLIP is superior. For the moment, I don't know.
Thanks for the CanIUseThisModel, I did not know or found. Things are more clear.
Of course, I did not invent the 512px trick, I read it!
See also: https://github.com/elixir-nx/bumblebee/tree/main/examples/phoenix#user-images
[UPDATED] You can consume the data by sending it directly to a bucket when you run an external: presign_upload
: it performs a XHR/fetch
request to the bucket endpoint with a presigned url and consumes the data. This means you can't run a prediction any more. You may need to upload the data to the server.
1) The HOOK: use this.upload
to send the renamed & resized & WEBP converted files from the client.
calcSHA1
. MDN source of non-crypto usagethis.upload
is the secret! An undocumented function found hereconst SIZES= [200, 512, 1440];
export default {
/**
* Renames a File object with its SHA1 hash and keep the extension
* source: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest#converting_a_digest_to_a_hex_string
* @param {File} file - the file input
* @returns {Promise<File>} a promise that resolves with a renamed File object
*/
async setHashName(file) {
const ext = file.type.split("/").at(-1);
const SHA1name = await this.calcSHA1(file);
return new File([file], `${SHA1name}.${ext}`, {
type: file.type,
});
},
/**
* Calculates a SHA1 hash using the native Web Crypto API.
* @param {File} file - the file to calculate the hash on.
* @returns {Promise<String>} a promise that resolves to hash as String
*/
async calcSHA1(file) {
const arrayBuffer = await file.arrayBuffer();
const hash = await window.crypto.subtle.digest("SHA-1", arrayBuffer);
const hashArray = Array.from(new Uint8Array(hash));
const hashAsString = hashArray
.map((b) => b.toString(16).padStart(2, "0"))
.join("");
return hashAsString;
},
/**
*
* @param {File} file - the file
* @param {number[]} SIZES - un array of sizes to resize to image to
* @returns {Promise<File[]>} a promise that resolves to an array of resized images
*/
async processFile(file, SIZES) {
return Promise.all(SIZES.map((size) => this.fReader(file, size)));
},
/**
* Reads an image file, resizes it to a given max size, and converts into WEBP format et returns it
* @param {File} file - the file image
* @param {number} MAX - the max size of the image in px
* @returns {Promise<File>} resolves with the converted file
*/
fReader(file, MAX) {
const self = this;
return new Promise((resolve, reject) => {
if (file) {
const img = new Image();
const newUrl = URL.createObjectURL(file);
img.src = newUrl;
img.onload = function () {
URL.revokeObjectURL(newUrl);
const { w, h } = self.resizeMax(img.width, img.height, MAX);
const canvas = document.createElement("canvas");
if (canvas.getContext) {
const ctx = canvas.getContext("2d");
canvas.width = w;
canvas.height = h;
ctx.drawImage(img, 0, 0, w, h);
// convert the image from the canvas into a Blob and convert into WEBP format
canvas.toBlob(
(blob) => {
const name = file.name.split(".")[0];
const convertedFile = new File([blob], `${name}-m${MAX}.webp`, {
type: "image/webp",
});
resolve(convertedFile);
},
"image/webp",
0.75
);
}
};
img.onerror = function () {
reject("Error loading image");
};
} else {
reject("No file selected");
}
});
},
resizeMax(w, h, MAX) {
if (w > h) {
if (w > MAX) {
h = h * (MAX / w);
w = MAX;
}
} else {
if (h > MAX) {
w = w * (MAX / h);
h = MAX;
}
}
return { w, h };
},
/**
* Takes a FileList and an array of sizes,
* then renames them with the SHA1 hash,
* then resizes the images according to a list of given sizes,
* and converts them to WEBP format,
* and finally uploads them.
* @param {FileList} files
* @param {number[]} SIZES
*/
async handleFiles(files, SIZES) {
const renamedFiles = await Promise.all(
[...files].map((file) => this.setHashName(file))
);
const fList = await Promise.all(
renamedFiles.map((file) => this.processFile(file, SIZES))
);
// the "secret" to upload to the server. Undocumented Phoenix.JS function
this.upload("images", fList.flat());
},
/*
inspired by: https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/image_classification.exs
*/
mounted() {
this.el.style.opacity = 0;
this.el.addEventListener("change", async (evt) =>
this.handleFiles(evt.target.files, SIZES)
);
// Drag and drop
this.el.addEventListener("dragover", (evt) => {
evt.stopPropagation();
evt.preventDefault();
evt.dataTransfer.dropEffect = "copy";
});
this.el.addEventListener("drop", async (evt) => {
evt.stopPropagation();
evt.preventDefault();
return this.handleFiles(evt.dataTransfer.files, SIZES);
});
},
};
@ndrean
t_img = Nx.from_binary(data, type) |> Nx.reshape({height, width, channels}, names: names)
I'm having trouble with this part. I keep stumbling upon this error when trying to reshape the tensor so I can feed it into the resnet-50
model.
** (ArgumentError) cannot reshape, current shape {11708} is not compatible with new shape {224, 224, 3}
I know for sure that the image is resized according to the model's specification (224x224
) up until this point.
I don't know what I'm doing wrong, I'm trying to follow Bumblebee
's guide to image classification.
Have you gotten this error before? π
@LuchoTurtle ah yes I remember now, I had this too, it was bug, and my code above was good with the bug until the maintainer corrected it, so its wrong now. but I did not correct it above...
The correct shape is a HWC tuple: width and height were inverted , you see what I did?
Make sure to have the latest version. I think this should work.
{:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
Image.write_to_tensor(image)
t_img = Nx.from_binary(data, type) |> Nx.reshape(shape, names: names)
Nx.Serving.batched_run(UpImg.Serving, t_img)
FYI you cannot deploy this on a small machine, you probably need 1GB RAM. I will probably come back to this as I want to finish this little project.
A 2GB RAM VPS instance on OVH IS β¬3.50/month β> https://github.com/dwyl/learn-devops/issues/64 π
Probably need a trial to see how to install on bare-metal (modulo Docker but...). I see they provide an IPv4, so you can put plenty of demo apps with subdomains I imagine. If I want to buy a domain say on Cloudfare, I will need to link OVH-Cloudfare. Should not be too complicated stuff.
About "prompt engineering": https://prmpts.ai/blog/what-is-prompt-engineering
I dispute that "prompt engineering" is engineering at all π .
But I do understand that there's an art to it. Refining models' output to get what you want is not easy, per se, but rather a matter of trial and error and specificity. It's definitely a skill but I honestly can't see the "engineering" part of it - it can be boiled down to clarity in communication and having to work with some quirks that GPT
or any other LLMs
may have. But hey, maybe I'm an idiot and I'm spewing nonsense, I don't know π
.
Although, I have to admit, I've dabbled with Stable Diffusion
much more than LLMs
(though I'm keen on biting the bullet and paying 20 quid a month for access to OpenAI
's API after their dev day at https://openai.com/blog/new-models-and-developer-products-announced-at-devday).
From what I've tried, I think "prompt engineering" is much harder with diffusion models than LLMs. But even then, you can circumvent issues with inpainting and ControlNet
to get more accurate results rather easily (though it's still very much trial and error, you can't ever get exactly what you want, just what's the closest to what you want).
For example, what I found to generate cool Ghibli-style images with Stable Diffusion
, I found that I had to work much more than simply using ChatGPT
or any other LLM
.
Civit.AI
to see the best results (for example, EasyFluff
).VAE
to use to get your preferred results (for example https://civitai.com/models/23906/kl-f8-anime2-vae).LORA
to use (like https://civitai.com/models/82098/add-more-details-detail-enhancer-tweaker-lora)This by itself is much more work to just yield fair results with generative art, something that is much more streamlined with LLMs
(or downright not present) and prompting.
I found prompting in diffusion models absolutely chaotic. But I've seen patterns, and I've had luck trying to follow imageboard tags and I assume many models are trained with these imageboards in mind, because they perform much better when I use these tags.
For example, I tend to follow a pattern for positive prompts
establish style + number of characters + the camera and/or landscape and/or scene properties and using `"BREAK"` between different subjects that I want in the pictures (to prevent getting mixed up).
Adding weight to each tag and you can go from there.
So is this workflow engineering? I don't believe so. It's not deterministic by nature. It's just proper concise communication. It's a skill, but I don't think there's anything esoteric about it.
I liked this answer from https://news.ycombinator.com/item?id=36971327.
> Is Prompt Engineering a Thing?
Yes, it's a dumb name for the skill of modifying your prompts and questions to the LLM in a way that produces better results than if you just asked for what you wanted plainly. As language models get better, this might become obsolete.
> I'm trying to research the subject but I don't see much evidence that companies are racing to hire prompt engineers.
Because it's not really a job. Think of it like using the Google search engine - being able to search well is something you can get better at but being a "Google search-er" isn't a career or a job you'll see openings for.
All in all, aside from my obvious ramble and digression, it's still an interesting read @ndrean . Because although I don't think it's engineering, it's a highly valuable skill that I want to get better at!
Nice @LuchoTurtle , you look pretty advanced!
Do you use only a Livebook to test all this?
There is indeed some vocabulary to ingest to enter this world. Being able to name things that are really useful is a powerful skill π but feels sometimes like much ado about "almost" nothing. Embedding, transformers, tokenizer, prompt-context etc on the other side are "real" concepts to be understood whilst so-called "engineering" is more like noise.
I am starting to watch/read this: https://www.coursera.org/learn/generative-ai-with-llms/lecture/ZVUcF/prompting-and-prompt-engineering
Playing with images gives an immediate wow effect. I highly recommend https://github.com/cbh123/emoji by the same guy who did Shinstagram. By the way, here is how he prompts engineers it.
I still have basic questions: how do you use these tools in practice to run this on production? Api based approach or embedded in your app somehow?
I do more modest down-to-earth things, more on the LLM side. My first step was image captioning. For example, to run this in practice, I embedded the model, ie download the data on a server as Bumblebee does this in fact. Then mount bind into the running container of your app. This is not totally straightforward: I can run the "base" model (1G) but not the large model (2G). I did not dig into this problem.
Another barrier is that few models can be used by the Elixir eco-system. I finally found something:
https://twitter.com/sean_moriarity/status/1715758666001928613
An explanation on how we add models to Bumblebee (@toranb asked on EEF slack and I thought it would be a good write up here).
— Sean Moriarity (@sean_moriarity) October 21, 2023
The first thing to note is that almost all of the models have significant overlap in implementation details. A transformer is a transformer. There areβ¦
Lastly, another barrier IMO is LiveView. Compared to Streamlit, it is far behind. Liveview is still complicated and fragile: navigation, "liveview session" is obscur. I had some errors I still don't understand. For example, I used a separate "html.heex" file that for some reason gave me double renderings. When I put the same markup into the render
function of the liveview, it worked. I also have some cache problems: you change the code but it doesn't render. Few headaches...
You can spend your life just watching youtube. However, this one is worth watching, you learn something: running ML in the browser, VERY instructive. This helps you to understand step by step this Huggingface world and consequently puts some light on the Elixir Bumblebee world (because honestly, they don't help you π).
Very good video. Thanks for sharing @ndrean Please have a read of: https://github.com/dwyl/image-classifier and share your thoughts. π
As for the job/title of "Prompt Engineer" ... While it's super "hot" right now to know how to refine queries to get pre-trained models to give useful results ...
I cannot help but think that this is something a 5-year-old child can do quite effectively. So it's only a matter of time before the "Prompt Engineers" are replaced.
What might not be replaced as quickly - though will eventually - are specific subject-matter-experts who use the corpus of knowledge to answer specific questions that non-experts wouldn't even think of. π But I honestly think as all knowledge gets sucked into ever more powerful LLMs and the LLMs have all the questions and answers they will be able to auto-suggest the prompts. So even a child will be able to prompt their way into a Nobel Prize. π
@nelsonic I looked quickly into the https://github.com/dwyl/image-classifier repo. Looks good. A few remarks.
1) Are you able to run this on Fly.io? Because I see that your Dockerfile uses the standard user "nobody" but Hugginface recommends a user 1000. https://huggingface.co/spaces/jonatanklosko/chai/blob/main/Dockerfile. This repo can be a reference: https://huggingface.co/docs/hub/spaces-sdks-docker#permissions. However, it downloads the model during the build stage, and I found this complicated. You opted to copy the model data from your host into the image https://github.com/dwyl/image-classifier/blob/d7205ca4a97a1d582436d5cc9d781eb80b6311b2/Dockerfile#L56, but you don't use ENV BUMBLEBEE_OFFLINE=true
in the Dockerfile. I believe that it will download the image, wouldn't it? I believe your image should use a volume to grab the data and contain only the running code. But if it works this way (the model is small?), then why not, it is not meant to be scaled I presume. Another detail is that the .bumblebee data is also persisted in the Github repo, but shouldn't it be in an LFS? or not at all.
2) You pass a base64 string to render the resized image, but why do you use a form to wrap the img
tag? https://github.com/dwyl/image-classifier/blob/d7205ca4a97a1d582436d5cc9d781eb80b6311b2/lib/app_web/live/page_live.html.heex#L22
3) Why do you need this pre-process-image?
4) Shouldn't the async task be async_nolink
, because if the serving fails, you may not want the main process to get killed.
5) You also have the library stb_image instaed of Vix. This can further reduce the image size. An example.
@ndrean Great feedback as always. CC: @LuchoTurtle (who is currently working on the Fly.io deployment/update ...)
ah ok, didn't look at who did it. So with Lucho, its in good hands :) I am interested to see your result as I want to deploy some thing similar but on a VPS (but using a bucket to save the images and SQLite to save the list of images/captions per user). Nothing huge but not obvious :)
@nelsonic @LuchoTurtle Fly.io volumes
I would try to copy the .bumblebee data you downloaded via Bumblebee into a fly volume. I think this can be done in the fly.toml with (not totally sure):
[mounts] source=$(pwd)/.bumblebee destination=/my-volume
Then you can get rid of the .bumblebee copy command in both stages, use the "nobody" user as Phoenix does, and reference the new location in the runner stage with:
ENV BUMBLEBEE_CACHE_DIR=/my-volume ? ENV BUMBLEBEE_OFFLINE=true
Now, you won't download the model but read it from the cache when the app starts.
However, not sure your image will fit in a 256MB machine....
Thanks for the feedback @ndrean , always appreciated! By the way, thank you for the video! Watched it all the way through, and it was immensely useful!
1 - Thanks, I didn't know about the "nobody" user had any impact. Will change it :)
And yes, I was trying to cache the model and was hoping to do this all on fly.io
. Meaning that on the first execution of the app in fly.io
, the model would be downloaded into .bumblebee
(hence why I created this directory) and then on subsequent runs, LiveView
would fetch the local model from it. I thought setting BUMBLEBEE_OFFLINE
was optional (I thought it was a flag to ALWAYS fetch locally) because I was under the impression that by setting the CACHE_DIR
, it would use the local model. Apparently, it doesn't, hence why I'm trying to fix it.
2 - I wrap the <img>
with a form so the user can click on the image again and upload another image if they want to.
3 - I was having trouble with the tensor dimensions initially. Because models usually work on a specific colourspace and without alpha (it's data that is not relevant), I wrote that little function that can be used anywhere. If flattens the alpha out, converts the colourspace and formats/reshapes the tensor to the correct format. That's how I got this to work :p
4 - OOh, interesting! Thank you for the suggestion :)
Regarding using volumes, I'm tempted to do so. I want to first try to get the model during the build stage (as you've mentioned) in the Dockerfile
so it's easier to deploy. I'm aware that this will result (depending on the model used) on a bigger container size but that's ok, we can scale the fly.io
machine up (yeah, 256MB
is super low).
But if that doesn't work, I'll try the volume approach. Thank you kindly :D
No it won't download the model in the build stage unless we explicitly "pre-run" the Bumblebee.load
and friends, in some mix
command. Take a look at "Livebeats with whispers" at how they do it the Dockerfile: they are explicit. But this becomes intricate and I don't like this way to do: the model should be in a separate volume, and passed via an env var.
Another interesting repo to prepare yourself to lose your job??
Another interesting repo to prepare yourself to lose your job??
Looks like https://github.com/Significant-Gravitas/AutoGPT :P
No it won't download the model in the build stage unless we explicitly "pre-run" the
Bumblebee.load
and friends, in somemix
command. Take a look at "Livebeats with whispers" at how they do it the Dockerfile: they are explicit. But this becomes intricate and I don't like this way to do: the model should be in a separate volume, and passed via an env var.
Thank you for the reply. I was trying to get it to work with something similar to that. I want to give both options a whirl but I'm having trouble with actually getting my Dockerfile to work by running something like
RUN /app/bin/app eval 'App.Application.serving()'
But it's not working.
Trying to debug locally but even then, it's a pain and even dumping logs in intermediate Docker layers isn't allowing me to see the filesystem at each step of the build stage.
I see your POV, though. Having it in the dockerfile makes it too tightly coupled but I'm still wanting to give it a try to document both approaches π
I hate this "doesn't work for me", but here we are. Same for me, doesn't work because if I recall correctly, it says "can't find "/app/bin/app".
When I run a release version, Application.serving()
works, but putting this in the Dockerfile (which minics what we do by hand, no?), well, doesn't....I did not find an answer.
Mine
ARG ELIXIR_VERSION=1.15.5
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20230612-slim
ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"
FROM ${BUILDER_IMAGE} as builder
ARG MIX_ENV
RUN apt-get update -y && apt-get install -y build-essential git libmagic-dev curl\
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
WORKDIR /app
RUN mix local.hex --force && \
mix local.rebar --force
ENV MIX_ENV="prod"
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile
COPY priv priv
COPY assets assets
COPY lib lib
RUN mix assets.deploy
RUN mix compile
# RUN mix run -e "UpImg.Ml.serve()" --no-start. <---- fails!
COPY config/runtime.exs config/
COPY rel rel
RUN mix release
################################
FROM ${RUNNER_IMAGE}
ARG MIX_ENV
RUN apt-get update -y && apt-get install -y libstdc++6 openssl libncurses5 locales libmagic-dev \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN chown nobody /app
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR=/app/bin/.bumblebee/blip
ENV BUMBLEBEE_OFFLINE=true
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/up_img ./
USER nobody
EXPOSE 4000
CMD ["/app/bin/server"]
Yep, precisely the issue I'm getting. The funny thing is that if I run RUN mix run -e 'App.Application.serving()' --no-start --no-halt
, it clearly downloads the model but fails afterwards.
Yes. By the way, are you running this locally?
Did you try by setting SECREY_KEY_BASE
in the command?
MY Docker commands to vercome this: a volume!
1) I create a volume (yes a bit intricate to do: you can't do this in an image, only a container, so you run a container from an alpine image, mount your local folder into it and bind mount the docker folder into a volume to get your volume...!)
docker run --rm -v blip:/model -v $(pwd)/.bumblebee/blip:/data alpine cp -a /data/. /model/
2) Run the image, inject env and bind mount the volume into the container.
docker run --env-file .env-docker -p 4000:4000 -it --rm --name up-img-cont -v blip:/app/bin/.bumblebee/blip/:ro up-img
Yeah, I'm building the docker locally to surpass errors. To be honest, if we're running this failling line just to get the model, we only care that it downloads. So if we do something like:
RUN mix run -e 'App.Application.serving()' --no-start --no-halt; exit 0
It should work.
And thanks for the "Volume approach". Once I get this one working, I'll do that one for sure (I really want to document both options)! (though, I ought to admit, by "Volumes", I thought you meant Fly.io Volumes, not Docker
ones).
I also have a question, since I'm really not well-versed with Docker. I understand that those commands work fine locally. But if I wanted to deploy it to Fly.io
, I'd have to somehow create volumes and bind that on the Dockerfile
, correct? Do you have any reference for those? I'd google for it but if you already know the answer, perhaps you can point me to the right direction :)
You maybe right with your "exit 0", but I even tried to run only the Bumblebee.load
and it failed, so I gave up and wnet ot use volumes.
Yes, in the previous post, I mentionned the Fly.io volumes, and access a volume. You don't run docker command in fly, but use a "fly.toml".
Some thing like [mounts] source=$(pwd)/.bumblebee destination=/my-volume
Because I don't know how to ssh into a fly machine. I mean, I don't don't which machine nor its ssh port nor ip address, so I can't
scp -P 22 -r .bumblebee/ <fly-machine>@<fly-ip>:your-volume
I tried again by curiosity. You will probably encounter this error. This is curious because we have "no-start".
=> ERROR [builder 15/18] RUN mix run -e 'UpImg.Ml.serve()' --no-start 30.9s
------
> [builder 15/18] RUN mix run -e 'UpImg.Ml.serve()' --no-start:
|====================
13.08 [output clipped, log limit 2MiB reached]
30.45 ** (exit) exited in: GenServer.call(EXLA.Client, {:client, :host, [platform: :host]}, :infinity)
30.45 ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Precisely. In fact, even though I got this error while running https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459, I could circumvent it by adding ; exit 0
.
But now, for some odd reason, I keep getting this error and it's not downloading the models anymore as it was before. I've tried reverting the code i had when I commented https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459 but it's not working anymore.
?? weird
I tried with success
RUN mix run -e 'UpImg.Ml.serve()' --no-start; exit 0
That works, for sure, it's just ignoring the error.
But it's maybe not downloading the model. When I commented in https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459, I could see the progress of the model being downloaded.
But not anymore π«₯, even though the code is exactly the same.
But if you check your container and see the model there, great news π₯³
Ah I did not copy in the last stage
Well it fails since nothing is downloaded indeed
The only way is to copy your local folder into the image and forget about this download. I don't understand how Livebeats made it work.
If you run
RUN mix run -e 'App.Application.serving()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee
You need to make sure that ENV BUMBLEBEE_OFFLINE=true
is not enabled before this step, otherwise it will not look for the repo.
For example, here's my dockerfile.
# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
# - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
# - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
# - https://pkgs.org/ - resource for finding needed packages
# - Ex: hexpm/elixir:1.15.7-erlang-26.0.2-debian-bullseye-20231009-slim
#
ARG ELIXIR_VERSION=1.15.7
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20231009-slim
ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"
FROM ${BUILDER_IMAGE} as builder
# install build dependencies (and curl for EXLA)
RUN apt-get update -y && apt-get install -y build-essential git curl \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# prepare build dir
WORKDIR /app
# install hex + rebar
RUN mix local.hex --force && \
mix local.rebar --force
# set build ENV
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee"
# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config
# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile
COPY priv priv
COPY lib lib
COPY assets assets
COPY .bumblebee/ .bumblebee
# compile assets
RUN mix assets.deploy
# Compile the release
RUN mix compile
# IMPORTANT: This downloads the HuggingFace models from the `serving` function in the `lib/app/application.ex` file.
# And copies to `.bumblebee`.
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee
# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/
COPY rel rel
RUN mix release
# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}
RUN apt-get update -y && \
apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN chown nobody /app
# set runner ENV
ENV MIX_ENV="prod"
ENV EXS_DRY_RUN="true"
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee
ENV BUMBLEBEE_OFFLINE=true
# Adding this so model can be downloaded
RUN mkdir -p /nonexistent
# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/app ./
COPY --from=builder --chown=nobody:root /app/.bumblebee/ ./.bumblebee
USER nobody
# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]
# Set the runtime ENV
ENV ECTO_IPV6="true"
ENV ERL_AFLAGS="-proto_dist inet6_tcp"
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee
ENV BUMBLEBEE_OFFLINE=true
CMD ["/app/bin/server"]
It runs successfully.
When I run the container instance locally, I can inclusively see the models downloaded in .bumblebee
.
Though the naming of these files is weird. It's just mambo jambo. I'll try to load these models locally.
ah yes, I had already loaded some ENv vars to run mix phx.server
, so Docker picked up these env vars.
To set BUMBLEBEE_OFFLINE=false
in the Dockerfile to be sure this won't happen again
Then why do you do COPY .bumblebee/ .bumblebee
? You are copying your host into the image, but the previous line is supposed to have done this.
My image inflated from 500Mb to 2.4Gb!.
Look at the running container: 2.4Gb memory! The peak is when I uploaded an image and ran the Image-To-Text serving. Not sure all this is super performant.
That is to be expected, though. How does it compare to loading from the volume?
If you check https://fly.io/phoenix-files/speed-up-your-boot-times-with-this-one-dockerfile-trick/, it is mentioned that's the only downsize - your docker image will be much bigger, since it quite literally has your model downloaded there.
However, I'm having trouble actually using the local model on runtime. I know I should use {:local,
path/to/model}
(according to https://hexdocs.pm/bumblebee/Bumblebee.html#t:repository/0). But, as I've shown, the downloaded model has the file names randomized :/
Yes, that was one of the reasons I mentioned. The difference is that you can run several instances at a minimal cost. You would run 2.5Gb x 2 =5Gb whilst with a volume, it will run 2Gb + 0.5Gb * 2 = 3Gb
One thing I am sure it works is (sorry I am not selling the volume but I want to summarise):
{:ok, model_info} = Bumblebee.load_model({:hf, model}
in the Elixir code in the serve/0
functionThe path to local: I guess it must be absolute, no?
~Path.join([UpImgWeb.Endpoint.static_path(:my_app), ".bumblebee"])
~
{:local, Path.join([Application.app_dir(:up_img), ".bumblebee/blip"])
Nah, it's okay, selling the volume absolutely makes sense π . It's the best option to save resources and to scale horizontally.
Though I don't understand, if you are using {:ok, model_info} = Bumblebee.load_model({:hf, model}
, aren't you downloading the model? Unless you set the :local
property to a path (and with this, Bumblebee with look for the cache directory), the model is downloading from HuggingFace every time, no?
So even if you use a volume, how are you taking advantage of it? You don't seem to be using it (according to those steps) in your code. So if I was to use a fly.io
instance that suspends itself after one hour, the cold start would re-download the model.
The path should be absolute, yes.
In the build stage, I do not invoke any RUN mix..., so it is just compiling the code. In fact, I never run the code.
When I run the image, then, yes, it is looking for the cache if nay. If you set the env BUMBLEBEE_CACHE_DIR, it will look for it. If you set {:local, path}
, it will probbly look for it there. This I am testing
I gave Bumblebee a try today. The idea was to provide predictions on image captioning to classify an image so that a user can use/put pre-filled tags to easily filter his images.
It turns out that the predictions are.....not too bad and quite fast., at least locally.
This is supposed to be a car:
https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp
Testing with a new query string:
pred=on
to run the model prediction:I tested 3 models: "facebook/deit-base-distilled-patch16-224" and "microsoft/resnet-50" and ""google/vit-base-patch16-224".
I don't know if anyone tested it?
I submit my code in case any reader sees some obvious fault. It runs locally. It is based on this example. I did not try to deploy this, but here is a guide before I forget: you need to set up a temp dir.
I decided to run a GenServer to start the
serving
with the app to load the model, but you can start anNx.Serving
in the Aplpication level as well, something like{Nx.serving, serving: serve(), name: UpImg.Serving}
where the functionApplication.serve
defines what is in the GenServer below.and it is started with the app:
The model - the repo id - is passed as an
env var
so I can very simply change it..In the API, I use
predict/1
when I upload an image from the browser and run this task in parallel to the S3 upload. It takes aVix.Vips.Image
, a transformation of a binary file:[EDITED]
and use it in the flow: