Testing Image-To-Text - Githubissues

ndrean commented 1 year ago

I gave Bumblebee a try today. The idea was to provide predictions on image captioning to classify an image so that a user can use/put pre-filled tags to easily filter his images.

It turns out that the predictions are.....not too bad and quite fast., at least locally.

This is supposed to be a car:

https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp

Nx.Serving.run(serving, t_img) 
#=>predictions: [
    %{
      label: "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon",
      score: 0.9203662276268005
    }
  ]

Testing with a new query string: pred=onto run the model prediction:

curl -X GET http://localhost:4000/api?url=https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp&w=300&pred=on

{"h":205,"w":300,"url":"https://dwyl-imgup.s3.eu-west-3.amazonaws.com/76F195C6.webp","new_size":11642,"predictions":[{"label":"beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon","score":0.9203662276268005}]],"init_size":79294,"w_origin":960,"h_origin":656,"url_origin":"https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp"}

I tested 3 models: "facebook/deit-base-distilled-patch16-224" and "microsoft/resnet-50" and ""google/vit-base-patch16-224".

I don't know if anyone tested it?

I submit my code in case any reader sees some obvious fault. It runs locally. It is based on this example. I did not try to deploy this, but here is a guide before I forget: you need to set up a temp dir.

#mix.exs
{:bumblebee, "~> 0.4.2"},
{:nx, "~> 0.6.1"},
{:exla, "~> 0.6.1"},
{:axon, "~> 0.6.0"},

I decided to run a GenServer to start the serving with the app to load the model, but you can start an Nx.Serving in the Aplpication level as well, something like {Nx.serving, serving: serve(), name: UpImg.Serving} where the function Application.serve defines what is in the GenServer below.

defmodule UpImg.GsPredict do
   use GenServer

  def start_link(opts) do
    {:ok, model} = Keyword.fetch(opts, :model)
    GenServer.start_link(__MODULE__, model, name: __MODULE__)
  end

  def serve, do: GenServer.call(__MODULE__, :serving)

  @impl true
  def init(model) do
    {:ok, model, {:continue, :load_model}}
  end

  @impl true
  def handle_continue(:load_model, model) do
    {:ok, resnet} = Bumblebee.load_model({:hf, model})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, model})

    {:noreply,
     Bumblebee.Vision.image_classification(resnet, featurizer,
       defn_options: [compiler: EXLA],
       top_k: 1,
       compile: [batch_size: 10]
     )}
  end

  @impl true
  def handle_call(:serving, _from, serving) do
    {:reply, serving, serving}
  end
end

and it is started with the app:

children = [
  ...,
  {UpImg.GsPredict, [model: System.fetch_env!("MODEL")]}
]

The model - the repo id - is passed as an env var so I can very simply change it..

In the API, I use predict/1 when I upload an image from the browser and run this task in parallel to the S3 upload. It takes a Vix.Vips.Image, a transformation of a binary file:

[EDITED]

def predict(%Vix.Vips.Image{} = image) do
    serving = UpImg.GsPredict.serve()

    {:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
      Vix.Vips.Image.write_to_tensor(image)

    #{width, height, channels} = shape <- wrong, shape should be HWC. Bug corrected.
    t_img = Nx.from_binary(data, type) |> Nx.reshape(shape, names: names)

    Nx.Serving.run(serving, t_img)
    Task.async(fn -> Nx.Serving.run(serving, t_img) end)
  end

and use it in the flow:

prediction_task = predict(my_image)
...
%{predictions: predictions} = Task.await(prediction_task)

ndrean commented 1 year ago

Prediction slows down the process, roughly 1.5s per request. I will try to deploy this horror.

The "GET" endpoint - where you pass in an URL of a pic - works with the query string addition "pred" (no prediction, thus faster is you don't pass one).

curl -X GET http://localhost:4000/api?url=....&w=300&pred=on

The "POST" endpoint - where you submit files from a client via a FormData to the API - also works, but you use a checkbox if you want the prediction (I capture it the same way, via a key "pred", thus there is a constraint on the FormData naming).

ndrean commented 1 year ago

For complteness,

https://github.com/elixir-nx/bumblebee/tree/main/examples/phoenix#user-images

it seems that when the image is too small that the findings are not so good. After reading a bit, it seems that sizes around 512x512 are recommended for Image-to-Text. The speed of the recognition is also related to the size of the image, the bigger the longer. To speed up the process, if an image is bigger, I resize to this size and run the ML model on it.
I added the redirection to accept images from "unsplash" for example. If you submit a src="https://source.unsplans.com/"; it will be redirected. To do this, I used and modified a Finch.stream. If it detects a redirection by reading the headers, it takes the received path and makes a recursion. Otherwise, it writes the stream into a file, so the process eventually ends. This way, the body is processed only once, redirection or not, has a low memory footprint and it does not slow down the process.
I changed to Nx.Serving.batched_run/3 as it seems to give faster results when treating several pictures (uploaded as a POST request).

To use batchde_run, the set up is different. The process Nx.Serving is launched in the Application module.

#Application.ex
children = [
   ...,
    {Nx.Serving, serving: serve(), name: UpImg.Serving, batch_size: 10, batch_timeout: 100}
]

defp serve do
  model = System.fetch_env!("MODEL")
  {:ok, resnet} = Bumblebee.load_model({:hf, model})
  {:ok, featurizer} = Bumblebee.load_featurizer({:hf, model})

  Bumblebee.Vision.image_classification(resnet, featurizer,
      defn_options: [compiler: EXLA],
      top_k: 1,
      compile: [batch_size: 10]
end

and then use instead:

def predict(%Vix.Vips.Image{} = image) do
    # serving = UpImg.GsPredict.serve()

    {:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
      Vix.Vips.Image.write_to_tensor(image)

    {width, height, channels} = shape
   # bug in Vix.Vips, with HWC and WHC....
    t_img = Nx.from_binary(data, type) |> Nx.reshape({height, width, channels}, names: names)

    Task.async(fn -> Nx.Serving.batched_run(UpImg.Serving, t_img) end)
    # Task.async(fn -> Nx.Serving.run(serving, t_img) end)
  end

! One must be careful with the async calls. When you run this async task, say %Task{} = task = predict(image), you can only get the result back -Task.await(task) - from the owner process.

ndrean commented 1 year ago

To read an URL and download it with Finch using streams and potentially accept redirects (and write it into a temp file), you can use the 302 and Location header:

{:ok, path} = Plug.upload.random_file("temp-stream")
{:ok, file} = File.open(path, [:binary, :write])

# url = "https://source.unsplash.com/QT-l619id6w"

request = Finch.build(:get, url)
stream_write(request, path)
File.close!(file)

def stream_write(request, file) do
   Finch.stream(UpImg.Finch, nil, fn
      {:status, status}, _acc ->
        status

      {:headers, headers}, status ->
         handle_headers(headers, status)

      {:data, data}, headers ->
          handle_data(file, data, headers)
    end)
end

def handle_headers(headers, 302), do:
  Enum.find(headers, &(elem(&1, 0) == "location"))

def handle_headers( headers, 200), do: headers

def handle_headers(_,_), do:  {:halt, "bad redirection"}

def handle_data(file, _,  {"location", location}), do:
  Finch.build(:get, location) |> stream_write(file)

def handle_data(_, _,  {:halt, "bad redirection"}), do:
   {:halt, "bad redirection"}

def handle_data(file, data, _) do
  case IO.binwrite(file, data) do
        :ok -> :ok
        {:error, reason} -> {:halt, reason}
   end
end

The memory footprint is low, at the expense of writing the body of the request into a file (but one can just append the chunk in memory if needed).

LuchoTurtle commented 1 year ago

Thanks for the excellent write-up, @ndrean , it was super insightful! I haven't tried Bumblebee but I'm wanting to (when I have more free time). Are you using any specific Hugging Face models in your experiments?

On the topic, you may also find https://github.com/replicate/replicate-elixir as another alternative. Unfortunately, it's tied to their platform, but it might still be fun to tinker with. This is from this AWESOME talk from Charlie Holtz in https://www.youtube.com/watch?v=TfZI5-oQSqI&ab_channel=ElixirConf. It's an awesome video that really highlights how Elixir has great built-in tools to get AI models with LiveView working seamlessly.

ndrean commented 1 year ago

Yes. I used microsoft/resnet model.

Thanks for the "replicate" link. I will give it a try too!

ndrean commented 1 year ago

@LuchoTurtle Thanks! I really enjoyed watching this video, had a lot of fun! 😀 I just realised that you already wrote plenty of good things on this subject before I woke up! So nothing new under the sun for you... 😀

https://github.com/dwyl/learn-elixir/issues/212 https://github.com/dwyl/image-classifier/issues/1

I am just looking at Image Classification - namely a weighted list of predictions - whilst you wanted Image-to-text, more ambitious. I just wondered what you would do with the generated text for an image because you need to further process this response to extract some keys points, if this is what you want.

For example, the Salesforce/BLIP is an I2T. I run it in a Livebook, the easiest way to do this. It downloads 1.7Gb... The generated code is:

{:ok, model_info} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})

{:ok, featurizer} =
  Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})

{:ok, tokenizer} =
  Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})

{:ok, generation_config} =
  Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})

generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)

serving =
  Bumblebee.Vision.image_to_text(model_info, featurizer, tokenizer, generation_config,
    compile: [batch_size: 1],
    defn_options: [compiler: EXLA]
  )

image_input = Kino.Input.image("Image", size: {384, 384})
form = Kino.Control.form([image: image_input], submit: "Run")
frame = Kino.Frame.new()

Kino.listen(form, fn %{data: %{image: image}} ->
  if image do
    Kino.Frame.render(frame, Kino.Text.new("Running..."))

    image =
      image.file_ref
      |> Kino.Input.file_path()
      |> File.read!()
      |> Nx.from_binary(:u8)
      |> Nx.reshape({image.height, image.width, 3})
    %{results: [%{text: text}]} = Nx.Serving.run(serving, image)
    Kino.Frame.render(frame, Kino.Text.new(text))
  end
end)

Kino.Layout.grid([form, frame], boxed: true, gap: 16)

I generated an image with Stable Diffusion, and submit it to BLIP. The result is pretty good! 😁, but it is not classification!

This works for me because the model is "deciphered" in some way in Bumblebee. What if I want to use a specific model? That I don't know how to proceed.

I used a small Image Classification model -300Mb downloaded - embeded into the app as this tends to be much smaller. However, the Image Classification is not as good - same image - even for a 1300x1000px image:

Replicate exposes an endpoint. You also need to be careful with the data you submit to get the right balance between speed and accuracy: you might pay too much or pay for nothing if you don't deliver a properly sized image :)

When you read this "official" example, they naturally stress that the navigator should resize pics instead of a/the server. However, In this git repo, the proposed JS code is a bit ... wordy.

So down to earth I tried to follow the repo recommendations - at least for a WebApp version - and I looked at how to do this. In fact, you can get a bunch of resized images from the browser with a Promise.all because the browser is efficient at doing this: a form accepts an image and you "just" inject it into different resizing canvas that you set, and call canvas.toBlob. You can target a thumbnail, or a "ML" sized image (512px) or 3 different sizes to match mobile, pad and full screen 1440px pretty quickly for example.

One point is the naming: you need a unique base identifier for all these files. It turns out that JS can produce a SHA1 easily, no library, so I used this as a unique naming base, modulo some size identifier. You can also convert into WEBP just like this, and this saves a lot. You can upload directly to a bucket, and pass down to the server the 512px file to do the ML stuff. The bucket does his stuff and returns a response back to the client - a bunch of URLs - and the client forwards the responses to the server where you update the socket. Meanwhile, the server did the ML stuff to produce a caption/prediction. It remains to save all this into the DB. With the SHA1 naming, we have a common identifier, almost collision free, so we can update the DB record easily. All easy async client-side and server-side. The main difficulty is the "hook".

The shinstagram source. He uses R2 but I did not get the details how he uses the CDN to serve the files.

LuchoTurtle commented 1 year ago

Thanks for the detailed write-up @ndrean , it is really super insightful! Once I'm cleared with other tasks, I want to give https://github.com/dwyl/image-classifier/issues/1 a whirl and, since you've put much more time and effort than I into this, I might ask ya for some pointers!

I am just looking at Image Classification - namely a weighted list of predictions - whilst you wanted Image-to-text, more ambitious. I just wondered what you would do with the generated text for an image because you need to further process this response to extract some keys points, if this is what you want.

Not necessarily. I actually want a list of keywords that describe an image, just like you want. However, I believe that one may yield fair results by using a combination of an image captioning model like BLIP and a regular LLM to better extract keywords. In the same way https://zhaohengyuan1.github.io/image2paragraph.github.io/ uses three models to densely and accurately describe an image, one may do something like:

Use BLIP to describe the image -> feed into an LLM to gather relevant keywords with context from the image.

Then again, this is pure speculation on my part.

I generated an image with Stable Diffusion, and submit it to BLIP. The result is pretty good! 😁, but it is not classification!

Your results are awesome! Though why do you say it's not considered "classification"? Is it because it's not yielding a set of weighted predictions in lieu of a simple phrase?

This works for me because the model is "deciphered" in some way in Bumblebee. What if I want to use a specific model? That I don't know how to proceed.

Apparently, you can't use any HuggingFace transformer with Bumblebee, which is a shame. I don't know the specifics but, according to https://github.com/elixir-nx/bumblebee#model-support, it "has to be implemented in Bumblebee" (whatever that means). However you can use https://jonatanklosko-bumblebee-tools.hf.space/apps/repository-inspector/36pihlb7tb7rvmovbnvrmjseud5mzdlxhbfaa6xywewlprok to check if a Transformer model from HuggingFace is supported or not.

So it's fair that you don't know how to use other models from HuggingFace because apparently it's not possible :p.

Image sizes

It's interesting, the deal with image sizes, as you pointed, spans to even image generation with Stable Difusion. Even when I'm doing img-to-img or inpainting, I yield much better results with 512px images.

Using multiple canvas with different sizes, injecting images there is a fun way of getting different-sized images, quite creative!

Thanks for the shinstagram source, I'll have to take a look at it! :D

ndrean commented 1 year ago

Yes, I see, a mix. Sometimes I have good predictions, but more often BLIP is superior. For the moment, I don't know.

Thanks for the CanIUseThisModel, I did not know or found. Things are more clear.

Of course, I did not invent the 512px trick, I read it!

[UPDATED] You can consume the data by sending it directly to a bucket when you run an external: presign_upload: it performs a XHR/fetch request to the bucket endpoint with a presigned url and consumes the data. This means you can't run a prediction any more. You may need to upload the data to the server.

1) The HOOK: use this.upload to send the renamed & resized & WEBP converted files from the client.

I generate a unique name per picture, simply a SHA1, supposed to be unique: calcSHA1. MDN source of non-crypto usage
I decided (why not!) to produce 3 versions: a "thumbnail" with max size 200px (I might read this from a dataset, itself populated via an ENV VAR Phoenix side), a "machine-learning" version of target size 512px (width or height), and a pseudo full-screen with max 1440px if needed. You get files named .original_extension
process the entries through a canvas to produce 3 new versions in format WEBP with canvas.drawImage and canvas.toBlob. Each file will produce 3 files renamed "-m-[200/512/1440].webp"
this.upload is the secret! An undocumented function found here

const SIZES= [200, 512, 1440];

export default {
   /**
   * Renames a File object with its SHA1 hash and keep the extension
   * source: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest#converting_a_digest_to_a_hex_string
   * @param {File} file - the file input
   * @returns {Promise<File>} a promise that resolves with a renamed File object
   */
  async setHashName(file) {
    const ext = file.type.split("/").at(-1);
    const SHA1name = await this.calcSHA1(file);
    return new File([file], `${SHA1name}.${ext}`, {
      type: file.type,
    });
  },
  /**
   * Calculates a SHA1 hash using the native Web Crypto API.
   * @param {File} file - the file to calculate the hash on.
   * @returns {Promise<String>} a promise that resolves to hash as String
   */
  async calcSHA1(file) {
    const arrayBuffer = await file.arrayBuffer();
    const hash = await window.crypto.subtle.digest("SHA-1", arrayBuffer);
    const hashArray = Array.from(new Uint8Array(hash));
    const hashAsString = hashArray
      .map((b) => b.toString(16).padStart(2, "0"))
      .join("");
    return hashAsString;
  },
  /**
   *
   * @param {File} file  - the file
   * @param {number[]} SIZES - un array of sizes to resize to image to
   * @returns {Promise<File[]>} a promise that resolves to an array of resized images
   */
  async processFile(file, SIZES) {
    return Promise.all(SIZES.map((size) => this.fReader(file, size)));
  },
  /**
   * Reads an image file, resizes it to a given max size, and converts into WEBP format et returns it
   * @param {File} file  - the file image
   * @param {number} MAX  - the max size of the image in px
   * @returns {Promise<File>} resolves with the converted file
   */
  fReader(file, MAX) {
    const self = this;

    return new Promise((resolve, reject) => {
      if (file) {
        const img = new Image();
        const newUrl = URL.createObjectURL(file);
        img.src = newUrl;

        img.onload = function () {
          URL.revokeObjectURL(newUrl);
          const { w, h } = self.resizeMax(img.width, img.height, MAX);
          const canvas = document.createElement("canvas");
          if (canvas.getContext) {
            const ctx = canvas.getContext("2d");
            canvas.width = w;
            canvas.height = h;
            ctx.drawImage(img, 0, 0, w, h);
            // convert the image from the canvas into a Blob and convert into WEBP format
            canvas.toBlob(
              (blob) => {
                const name = file.name.split(".")[0];
                const convertedFile = new File([blob], `${name}-m${MAX}.webp`, {
                  type: "image/webp",
                });
                resolve(convertedFile);
              },
              "image/webp",
              0.75
            );
          }
        };
        img.onerror = function () {
          reject("Error loading image");
        };
      } else {
        reject("No file selected");
      }
    });
  },
  resizeMax(w, h, MAX) {
    if (w > h) {
      if (w > MAX) {
        h = h * (MAX / w);
        w = MAX;
      }
    } else {
      if (h > MAX) {
        w = w * (MAX / h);
        h = MAX;
      }
    }
    return { w, h };
  },
  /**
   * Takes a FileList and an array of sizes, 
   * then renames them with the SHA1 hash, 
   * then resizes the images according to a list of given sizes, 
   * and converts them to WEBP format, 
   * and finally uploads them.
   * @param {FileList} files
   * @param {number[]} SIZES
   */
  async handleFiles(files, SIZES) {
    const renamedFiles = await Promise.all(
      [...files].map((file) => this.setHashName(file))
    );

    const fList = await Promise.all(
      renamedFiles.map((file) => this.processFile(file, SIZES))
    );

    // the "secret" to upload to the server. Undocumented Phoenix.JS function
    this.upload("images", fList.flat());
  },
  /*
  inspired by: https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/image_classification.exs
  */
  mounted() {
    this.el.style.opacity = 0;

    this.el.addEventListener("change", async (evt) =>
      this.handleFiles(evt.target.files, SIZES)
    );

    // Drag and drop
    this.el.addEventListener("dragover", (evt) => {
      evt.stopPropagation();
      evt.preventDefault();
      evt.dataTransfer.dropEffect = "copy";
    });

    this.el.addEventListener("drop", async (evt) => {
      evt.stopPropagation();
      evt.preventDefault();
      return this.handleFiles(evt.dataTransfer.files, SIZES);
    });
  },
};

LuchoTurtle commented 1 year ago

@ndrean

    t_img = Nx.from_binary(data, type) |> Nx.reshape({height, width, channels}, names: names)

I'm having trouble with this part. I keep stumbling upon this error when trying to reshape the tensor so I can feed it into the resnet-50 model.

** (ArgumentError) cannot reshape, current shape {11708} is not compatible with new shape {224, 224, 3}

I know for sure that the image is resized according to the model's specification (224x224) up until this point. I don't know what I'm doing wrong, I'm trying to follow Bumblebee's guide to image classification.

Have you gotten this error before? 👀

ndrean commented 1 year ago

@LuchoTurtle ah yes I remember now, I had this too, it was bug, and my code above was good with the bug until the maintainer corrected it, so its wrong now. but I did not correct it above...

The correct shape is a HWC tuple: width and height were inverted , you see what I did?

Make sure to have the latest version. I think this should work.

 {:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
          Image.write_to_tensor(image)

        t_img = Nx.from_binary(data, type) |> Nx.reshape(shape, names: names)

        Nx.Serving.batched_run(UpImg.Serving, t_img)

FYI you cannot deploy this on a small machine, you probably need 1GB RAM. I will probably come back to this as I want to finish this little project.

nelsonic commented 1 year ago

A 2GB RAM VPS instance on OVH IS €3.50/month —> https://github.com/dwyl/learn-devops/issues/64 💭

ndrean commented 1 year ago

Probably need a trial to see how to install on bare-metal (modulo Docker but...). I see they provide an IPv4, so you can put plenty of demo apps with subdomains I imagine. If I want to buy a domain say on Cloudfare, I will need to link OVH-Cloudfare. Should not be too complicated stuff.

ndrean commented 1 year ago

About "prompt engineering": https://prmpts.ai/blog/what-is-prompt-engineering

LuchoTurtle commented 1 year ago

I dispute that "prompt engineering" is engineering at all 😅.

But I do understand that there's an art to it. Refining models' output to get what you want is not easy, per se, but rather a matter of trial and error and specificity. It's definitely a skill but I honestly can't see the "engineering" part of it - it can be boiled down to clarity in communication and having to work with some quirks that GPT or any other LLMs may have. But hey, maybe I'm an idiot and I'm spewing nonsense, I don't know 😅.

Although, I have to admit, I've dabbled with Stable Diffusion much more than LLMs (though I'm keen on biting the bullet and paying 20 quid a month for access to OpenAI's API after their dev day at https://openai.com/blog/new-models-and-developer-products-announced-at-devday).

From what I've tried, I think "prompt engineering" is much harder with diffusion models than LLMs. But even then, you can circumvent issues with inpainting and ControlNet to get more accurate results rather easily (though it's still very much trial and error, you can't ever get exactly what you want, just what's the closest to what you want).

For example, what I found to generate cool Ghibli-style images with Stable Diffusion, I found that I had to work much more than simply using ChatGPT or any other LLM.

having to try different models and checkpoints from Civit.AI to see the best results (for example, EasyFluff).
knowing which VAE to use to get your preferred results (for example https://civitai.com/models/23906/kl-f8-anime2-vae).
reducing noise and improving consistency with extensions like https://github.com/Seshelle/CFG_Rescale_webui.
trying different embeddings for negative prompts or positive prompts.
knowing which LORA to use (like https://civitai.com/models/82098/add-more-details-detail-enhancer-tweaker-lora)

This by itself is much more work to just yield fair results with generative art, something that is much more streamlined with LLMs (or downright not present) and prompting. I found prompting in diffusion models absolutely chaotic. But I've seen patterns, and I've had luck trying to follow imageboard tags and I assume many models are trained with these imageboards in mind, because they perform much better when I use these tags.

For example, I tend to follow a pattern for positive prompts

establish style +  number of characters +  the camera and/or landscape and/or scene properties and using `"BREAK"` between different subjects that I want in the pictures (to prevent getting mixed up).

Adding weight to each tag and you can go from there.

Is this engineering?

So is this workflow engineering? I don't believe so. It's not deterministic by nature. It's just proper concise communication. It's a skill, but I don't think there's anything esoteric about it.

I liked this answer from https://news.ycombinator.com/item?id=36971327.

> Is Prompt Engineering a Thing?

Yes, it's a dumb name for the skill of modifying your prompts and questions to the LLM in a way that produces better results than if you just asked for what you wanted plainly. As language models get better, this might become obsolete.

> I'm trying to research the subject but I don't see much evidence that companies are racing to hire prompt engineers.

Because it's not really a job. Think of it like using the Google search engine - being able to search well is something you can get better at but being a "Google search-er" isn't a career or a job you'll see openings for.

All in all, aside from my obvious ramble and digression, it's still an interesting read @ndrean . Because although I don't think it's engineering, it's a highly valuable skill that I want to get better at!

ndrean commented 1 year ago

Nice @LuchoTurtle , you look pretty advanced!

Do you use only a Livebook to test all this?

There is indeed some vocabulary to ingest to enter this world. Being able to name things that are really useful is a powerful skill 😀 but feels sometimes like much ado about "almost" nothing. Embedding, transformers, tokenizer, prompt-context etc on the other side are "real" concepts to be understood whilst so-called "engineering" is more like noise.

I am starting to watch/read this: https://www.coursera.org/learn/generative-ai-with-llms/lecture/ZVUcF/prompting-and-prompt-engineering

Playing with images gives an immediate wow effect. I highly recommend https://github.com/cbh123/emoji by the same guy who did Shinstagram. By the way, here is how he prompts engineers it.

I still have basic questions: how do you use these tools in practice to run this on production? Api based approach or embedded in your app somehow?

I do more modest down-to-earth things, more on the LLM side. My first step was image captioning. For example, to run this in practice, I embedded the model, ie download the data on a server as Bumblebee does this in fact. Then mount bind into the running container of your app. This is not totally straightforward: I can run the "base" model (1G) but not the large model (2G). I did not dig into this problem.

Another barrier is that few models can be used by the Elixir eco-system. I finally found something:

https://twitter.com/sean_moriarity/status/1715758666001928613

An explanation on how we add models to Bumblebee (@toranb asked on EEF slack and I thought it would be a good write up here).

The first thing to note is that almost all of the models have significant overlap in implementation details. A transformer is a transformer. There are…
— Sean Moriarity (@sean_moriarity) October 21, 2023

Lastly, another barrier IMO is LiveView. Compared to Streamlit, it is far behind. Liveview is still complicated and fragile: navigation, "liveview session" is obscur. I had some errors I still don't understand. For example, I used a separate "html.heex" file that for some reason gave me double renderings. When I put the same markup into the render function of the liveview, it worked. I also have some cache problems: you change the code but it doesn't render. Few headaches...

ndrean commented 1 year ago

You can spend your life just watching youtube. However, this one is worth watching, you learn something: running ML in the browser, VERY instructive. This helps you to understand step by step this Huggingface world and consequently puts some light on the Elixir Bumblebee world (because honestly, they don't help you 😏).

https://www.youtube.com/watch?v=QdDoFfkVkcw

Screenshot 2023-11-11 at 10 54 06

nelsonic commented 1 year ago

Very good video. Thanks for sharing @ndrean Please have a read of: https://github.com/dwyl/image-classifier and share your thoughts. 🙏

As for the job/title of "Prompt Engineer" ... While it's super "hot" right now to know how to refine queries to get pre-trained models to give useful results ...

I cannot help but think that this is something a 5-year-old child can do quite effectively. So it's only a matter of time before the "Prompt Engineers" are replaced.

What might not be replaced as quickly - though will eventually - are specific subject-matter-experts who use the corpus of knowledge to answer specific questions that non-experts wouldn't even think of. 💭 But I honestly think as all knowledge gets sucked into ever more powerful LLMs and the LLMs have all the questions and answers they will be able to auto-suggest the prompts. So even a child will be able to prompt their way into a Nobel Prize. 😉

ndrean commented 1 year ago

@nelsonic I looked quickly into the https://github.com/dwyl/image-classifier repo. Looks good. A few remarks.

1) Are you able to run this on Fly.io? Because I see that your Dockerfile uses the standard user "nobody" but Hugginface recommends a user 1000. https://huggingface.co/spaces/jonatanklosko/chai/blob/main/Dockerfile. This repo can be a reference: https://huggingface.co/docs/hub/spaces-sdks-docker#permissions. However, it downloads the model during the build stage, and I found this complicated. You opted to copy the model data from your host into the image https://github.com/dwyl/image-classifier/blob/d7205ca4a97a1d582436d5cc9d781eb80b6311b2/Dockerfile#L56, but you don't use ENV BUMBLEBEE_OFFLINE=true in the Dockerfile. I believe that it will download the image, wouldn't it? I believe your image should use a volume to grab the data and contain only the running code. But if it works this way (the model is small?), then why not, it is not meant to be scaled I presume. Another detail is that the .bumblebee data is also persisted in the Github repo, but shouldn't it be in an LFS? or not at all.

2) You pass a base64 string to render the resized image, but why do you use a form to wrap the img tag? https://github.com/dwyl/image-classifier/blob/d7205ca4a97a1d582436d5cc9d781eb80b6311b2/lib/app_web/live/page_live.html.heex#L22

3) Why do you need this pre-process-image?

4) Shouldn't the async task be async_nolink, because if the serving fails, you may not want the main process to get killed.

5) You also have the library stb_image instaed of Vix. This can further reduce the image size. An example.

nelsonic commented 1 year ago

@ndrean Great feedback as always. CC: @LuchoTurtle (who is currently working on the Fly.io deployment/update ...)

ndrean commented 1 year ago

ah ok, didn't look at who did it. So with Lucho, its in good hands :) I am interested to see your result as I want to deploy some thing similar but on a VPS (but using a bucket to save the images and SQLite to save the list of images/captions per user). Nothing huge but not obvious :)

ndrean commented 1 year ago

@nelsonic @LuchoTurtle Fly.io volumes

I would try to copy the .bumblebee data you downloaded via Bumblebee into a fly volume. I think this can be done in the fly.toml with (not totally sure):

[mounts] source=$(pwd)/.bumblebee destination=/my-volume

Then you can get rid of the .bumblebee copy command in both stages, use the "nobody" user as Phoenix does, and reference the new location in the runner stage with:

ENV BUMBLEBEE_CACHE_DIR=/my-volume ? ENV BUMBLEBEE_OFFLINE=true

Now, you won't download the model but read it from the cache when the app starts.

However, not sure your image will fit in a 256MB machine....

LuchoTurtle commented 1 year ago

Thanks for the feedback @ndrean , always appreciated! By the way, thank you for the video! Watched it all the way through, and it was immensely useful!

1 - Thanks, I didn't know about the "nobody" user had any impact. Will change it :) And yes, I was trying to cache the model and was hoping to do this all on fly.io. Meaning that on the first execution of the app in fly.io, the model would be downloaded into .bumblebee (hence why I created this directory) and then on subsequent runs, LiveView would fetch the local model from it. I thought setting BUMBLEBEE_OFFLINE was optional (I thought it was a flag to ALWAYS fetch locally) because I was under the impression that by setting the CACHE_DIR, it would use the local model. Apparently, it doesn't, hence why I'm trying to fix it.

2 - I wrap the <img> with a form so the user can click on the image again and upload another image if they want to.

3 - I was having trouble with the tensor dimensions initially. Because models usually work on a specific colourspace and without alpha (it's data that is not relevant), I wrote that little function that can be used anywhere. If flattens the alpha out, converts the colourspace and formats/reshapes the tensor to the correct format. That's how I got this to work :p

4 - OOh, interesting! Thank you for the suggestion :)

Regarding using volumes, I'm tempted to do so. I want to first try to get the model during the build stage (as you've mentioned) in the Dockerfile so it's easier to deploy. I'm aware that this will result (depending on the model used) on a bigger container size but that's ok, we can scale the fly.io machine up (yeah, 256MB is super low).

But if that doesn't work, I'll try the volume approach. Thank you kindly :D

ndrean commented 1 year ago

No it won't download the model in the build stage unless we explicitly "pre-run" the Bumblebee.load and friends, in some mix command. Take a look at "Livebeats with whispers" at how they do it the Dockerfile: they are explicit. But this becomes intricate and I don't like this way to do: the model should be in a separate volume, and passed via an env var.

ndrean commented 1 year ago

Another interesting repo to prepare yourself to lose your job??

https://github.com/KillianLucas/open-interpreter/

Screenshot 2023-11-13 at 15 06 49

LuchoTurtle commented 1 year ago

Another interesting repo to prepare yourself to lose your job??

KillianLucas/open-interpreter

Looks like https://github.com/Significant-Gravitas/AutoGPT :P

LuchoTurtle commented 1 year ago

No it won't download the model in the build stage unless we explicitly "pre-run" the Bumblebee.load and friends, in some mix command. Take a look at "Livebeats with whispers" at how they do it the Dockerfile: they are explicit. But this becomes intricate and I don't like this way to do: the model should be in a separate volume, and passed via an env var.

Thank you for the reply. I was trying to get it to work with something similar to that. I want to give both options a whirl but I'm having trouble with actually getting my Dockerfile to work by running something like

RUN /app/bin/app eval 'App.Application.serving()'

But it's not working.

Trying to debug locally but even then, it's a pain and even dumping logs in intermediate Docker layers isn't allowing me to see the filesystem at each step of the build stage.

I see your POV, though. Having it in the dockerfile makes it too tightly coupled but I'm still wanting to give it a try to document both approaches 👌

ndrean commented 1 year ago

I hate this "doesn't work for me", but here we are. Same for me, doesn't work because if I recall correctly, it says "can't find "/app/bin/app".

When I run a release version, Application.serving() works, but putting this in the Dockerfile (which minics what we do by hand, no?), well, doesn't....I did not find an answer.

ndrean commented 1 year ago

Mine

ARG ELIXIR_VERSION=1.15.5
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20230612-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

ARG MIX_ENV
RUN apt-get update -y && apt-get install -y build-essential git libmagic-dev curl\
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

WORKDIR /app

RUN mix local.hex --force && \
  mix local.rebar --force

ENV MIX_ENV="prod"

COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv
COPY assets assets
COPY lib lib
RUN mix assets.deploy

RUN mix compile

# RUN  mix run -e  "UpImg.Ml.serve()" --no-start.  <----    fails!

COPY config/runtime.exs config/
COPY rel rel
RUN mix release

################################

FROM ${RUNNER_IMAGE}

ARG MIX_ENV

RUN apt-get update -y && apt-get install -y libstdc++6 openssl libncurses5 locales libmagic-dev \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

ENV MIX_ENV="prod"

ENV BUMBLEBEE_CACHE_DIR=/app/bin/.bumblebee/blip
ENV BUMBLEBEE_OFFLINE=true

COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/up_img ./

USER nobody

EXPOSE 4000

CMD ["/app/bin/server"]

LuchoTurtle commented 1 year ago

Yep, precisely the issue I'm getting. The funny thing is that if I run RUN mix run -e 'App.Application.serving()' --no-start --no-halt, it clearly downloads the model but fails afterwards.

ndrean commented 1 year ago

Yes. By the way, are you running this locally?

Did you try by setting SECREY_KEY_BASE in the command?

ndrean commented 1 year ago

MY Docker commands to vercome this: a volume!

1) I create a volume (yes a bit intricate to do: you can't do this in an image, only a container, so you run a container from an alpine image, mount your local folder into it and bind mount the docker folder into a volume to get your volume...!)

docker run --rm -v blip:/model -v $(pwd)/.bumblebee/blip:/data alpine cp -a /data/. /model/

2) Run the image, inject env and bind mount the volume into the container.

docker run --env-file .env-docker -p 4000:4000 -it --rm --name up-img-cont -v blip:/app/bin/.bumblebee/blip/:ro up-img

LuchoTurtle commented 1 year ago

Yeah, I'm building the docker locally to surpass errors. To be honest, if we're running this failling line just to get the model, we only care that it downloads. So if we do something like:

RUN mix run -e 'App.Application.serving()' --no-start --no-halt; exit 0

It should work.

And thanks for the "Volume approach". Once I get this one working, I'll do that one for sure (I really want to document both options)! (though, I ought to admit, by "Volumes", I thought you meant Fly.io Volumes, not Docker ones).

I also have a question, since I'm really not well-versed with Docker. I understand that those commands work fine locally. But if I wanted to deploy it to Fly.io, I'd have to somehow create volumes and bind that on the Dockerfile, correct? Do you have any reference for those? I'd google for it but if you already know the answer, perhaps you can point me to the right direction :)

ndrean commented 1 year ago

You maybe right with your "exit 0", but I even tried to run only the Bumblebee.load and it failed, so I gave up and wnet ot use volumes.

Yes, in the previous post, I mentionned the Fly.io volumes, and access a volume. You don't run docker command in fly, but use a "fly.toml".

Some thing like [mounts] source=$(pwd)/.bumblebee destination=/my-volume

ndrean commented 1 year ago

Because I don't know how to ssh into a fly machine. I mean, I don't don't which machine nor its ssh port nor ip address, so I can't

scp -P 22 -r .bumblebee/ <fly-machine>@<fly-ip>:your-volume

ndrean commented 1 year ago

I tried again by curiosity. You will probably encounter this error. This is curious because we have "no-start".

 => ERROR [builder 15/18] RUN  mix run -e  'UpImg.Ml.serve()' --no-start                                                                                                                                                               30.9s
------
 > [builder 15/18] RUN  mix run -e  'UpImg.Ml.serve()' --no-start:
|====================
13.08 [output clipped, log limit 2MiB reached]
30.45 ** (exit) exited in: GenServer.call(EXLA.Client, {:client, :host, [platform: :host]}, :infinity)
30.45     ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

LuchoTurtle commented 1 year ago

Precisely. In fact, even though I got this error while running https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459, I could circumvent it by adding ; exit 0.

But now, for some odd reason, I keep getting this error and it's not downloading the models anymore as it was before. I've tried reverting the code i had when I commented https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459 but it's not working anymore.

?? weird

ndrean commented 1 year ago

I tried with success

RUN  mix run -e  'UpImg.Ml.serve()' --no-start; exit 0

LuchoTurtle commented 1 year ago

That works, for sure, it's just ignoring the error.

But it's maybe not downloading the model. When I commented in https://github.com/dwyl/imgup/issues/131#issuecomment-1808265459, I could see the progress of the model being downloaded.

But not anymore 🫥, even though the code is exactly the same.

But if you check your container and see the model there, great news 🥳

ndrean commented 1 year ago

Ah I did not copy in the last stage

ndrean commented 1 year ago

Well it fails since nothing is downloaded indeed

ndrean commented 1 year ago

The only way is to copy your local folder into the image and forget about this download. I don't understand how Livebeats made it work.

LuchoTurtle commented 1 year ago

If you run

RUN mix run -e 'App.Application.serving()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee

You need to make sure that ENV BUMBLEBEE_OFFLINE=true is not enabled before this step, otherwise it will not look for the repo.

For example, here's my dockerfile.

# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
#   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
#   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
#   - https://pkgs.org/ - resource for finding needed packages
#   - Ex: hexpm/elixir:1.15.7-erlang-26.0.2-debian-bullseye-20231009-slim
#
ARG ELIXIR_VERSION=1.15.7
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20231009-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies (and curl for EXLA)
RUN apt-get update -y && apt-get install -y build-essential git curl \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

COPY .bumblebee/ .bumblebee

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# IMPORTANT: This downloads the HuggingFace models from the `serving` function in the `lib/app/application.ex` file. 
# And copies to `.bumblebee`.
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && \
  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"
ENV EXS_DRY_RUN="true"
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee
ENV BUMBLEBEE_OFFLINE=true

# Adding this so model can be downloaded
RUN mkdir -p /nonexistent

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/app ./
COPY --from=builder --chown=nobody:root /app/.bumblebee/ ./.bumblebee

USER nobody

# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]

# Set the runtime ENV
ENV ECTO_IPV6="true"
ENV ERL_AFLAGS="-proto_dist inet6_tcp"
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee
ENV BUMBLEBEE_OFFLINE=true

CMD ["/app/bin/server"]

It runs successfully.

When I run the container instance locally, I can inclusively see the models downloaded in .bumblebee.

Though the naming of these files is weird. It's just mambo jambo. I'll try to load these models locally.

ndrean commented 1 year ago

ah yes, I had already loaded some ENv vars to run mix phx.server, so Docker picked up these env vars. To set BUMBLEBEE_OFFLINE=false in the Dockerfile to be sure this won't happen again

Then why do you do COPY .bumblebee/ .bumblebee? You are copying your host into the image, but the previous line is supposed to have done this.

ndrean commented 1 year ago

My image inflated from 500Mb to 2.4Gb!.

Look at the running container: 2.4Gb memory! The peak is when I uploaded an image and ran the Image-To-Text serving. Not sure all this is super performant.

LuchoTurtle commented 1 year ago

That is to be expected, though. How does it compare to loading from the volume?

If you check https://fly.io/phoenix-files/speed-up-your-boot-times-with-this-one-dockerfile-trick/, it is mentioned that's the only downsize - your docker image will be much bigger, since it quite literally has your model downloaded there.

However, I'm having trouble actually using the local model on runtime. I know I should use {:local,path/to/model} (according to https://hexdocs.pm/bumblebee/Bumblebee.html#t:repository/0). But, as I've shown, the downloaded model has the file names randomized :/

ndrean commented 1 year ago

Yes, that was one of the reasons I mentioned. The difference is that you can run several instances at a minimal cost. You would run 2.5Gb x 2 =5Gb whilst with a volume, it will run 2Gb + 0.5Gb * 2 = 3Gb

ndrean commented 1 year ago

One thing I am sure it works is (sorry I am not selling the volume but I want to summarise):

I run {:ok, model_info} = Bumblebee.load_model({:hf, model} in the Elixir code in the serve/0 function
use a volume "blip"
do nothing in the Dockerfile
pass an env var BUMBLEBEE_CACHE_DIR=/app/bin/.bumblebee/blip (in my case) and mount bind this volume "blip" to this env var

ndrean commented 1 year ago

The path to local: I guess it must be absolute, no?

~Path.join([UpImgWeb.Endpoint.static_path(:my_app), ".bumblebee"])~

{:local, Path.join([Application.app_dir(:up_img), ".bumblebee/blip"])

LuchoTurtle commented 1 year ago

Nah, it's okay, selling the volume absolutely makes sense 😂 . It's the best option to save resources and to scale horizontally.

Though I don't understand, if you are using {:ok, model_info} = Bumblebee.load_model({:hf, model}, aren't you downloading the model? Unless you set the :local property to a path (and with this, Bumblebee with look for the cache directory), the model is downloading from HuggingFace every time, no?

So even if you use a volume, how are you taking advantage of it? You don't seem to be using it (according to those steps) in your code. So if I was to use a fly.io instance that suspends itself after one hour, the cold start would re-download the model.

The path should be absolute, yes.

ndrean commented 1 year ago

In the build stage, I do not invoke any RUN mix..., so it is just compiling the code. In fact, I never run the code.

When I run the image, then, yes, it is looking for the cache if nay. If you set the env BUMBLEBEE_CACHE_DIR, it will look for it. If you set {:local, path}, it will probbly look for it there. This I am testing

dwyl / imgup

Testing Image-To-Text #131

Is this engineering?