Testing Image-To-Text - Githubissues

ndrean commented 1 year ago

I gave Bumblebee a try today. The idea was to provide predictions on image captioning to classify an image so that a user can use/put pre-filled tags to easily filter his images.

It turns out that the predictions are.....not too bad and quite fast., at least locally.

This is supposed to be a car:

https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp

Nx.Serving.run(serving, t_img) 
#=>predictions: [
    %{
      label: "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon",
      score: 0.9203662276268005
    }
  ]

Testing with a new query string: pred=onto run the model prediction:

curl -X GET http://localhost:4000/api?url=https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp&w=300&pred=on

{"h":205,"w":300,"url":"https://dwyl-imgup.s3.eu-west-3.amazonaws.com/76F195C6.webp","new_size":11642,"predictions":[{"label":"beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon","score":0.9203662276268005}]],"init_size":79294,"w_origin":960,"h_origin":656,"url_origin":"https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp"}

I tested 3 models: "facebook/deit-base-distilled-patch16-224" and "microsoft/resnet-50" and ""google/vit-base-patch16-224".

I don't know if anyone tested it?

I submit my code in case any reader sees some obvious fault. It runs locally. It is based on this example. I did not try to deploy this, but here is a guide before I forget: you need to set up a temp dir.

#mix.exs
{:bumblebee, "~> 0.4.2"},
{:nx, "~> 0.6.1"},
{:exla, "~> 0.6.1"},
{:axon, "~> 0.6.0"},

I decided to run a GenServer to start the serving with the app to load the model, but you can start an Nx.Serving in the Aplpication level as well, something like {Nx.serving, serving: serve(), name: UpImg.Serving} where the function Application.serve defines what is in the GenServer below.

defmodule UpImg.GsPredict do
   use GenServer

  def start_link(opts) do
    {:ok, model} = Keyword.fetch(opts, :model)
    GenServer.start_link(__MODULE__, model, name: __MODULE__)
  end

  def serve, do: GenServer.call(__MODULE__, :serving)

  @impl true
  def init(model) do
    {:ok, model, {:continue, :load_model}}
  end

  @impl true
  def handle_continue(:load_model, model) do
    {:ok, resnet} = Bumblebee.load_model({:hf, model})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, model})

    {:noreply,
     Bumblebee.Vision.image_classification(resnet, featurizer,
       defn_options: [compiler: EXLA],
       top_k: 1,
       compile: [batch_size: 10]
     )}
  end

  @impl true
  def handle_call(:serving, _from, serving) do
    {:reply, serving, serving}
  end
end

and it is started with the app:

children = [
  ...,
  {UpImg.GsPredict, [model: System.fetch_env!("MODEL")]}
]

The model - the repo id - is passed as an env var so I can very simply change it..

In the API, I use predict/1 when I upload an image from the browser and run this task in parallel to the S3 upload. It takes a Vix.Vips.Image, a transformation of a binary file:

[EDITED]

def predict(%Vix.Vips.Image{} = image) do
    serving = UpImg.GsPredict.serve()

    {:ok, %Vix.Tensor{data: data, shape: shape, names: names, type: type}} =
      Vix.Vips.Image.write_to_tensor(image)

    #{width, height, channels} = shape <- wrong, shape should be HWC. Bug corrected.
    t_img = Nx.from_binary(data, type) |> Nx.reshape(shape, names: names)

    Nx.Serving.run(serving, t_img)
    Task.async(fn -> Nx.Serving.run(serving, t_img) end)
  end

and use it in the flow:

prediction_task = predict(my_image)
...
%{predictions: predictions} = Task.await(prediction_task)

LuchoTurtle commented 1 year ago

Right. I understand you're not running any RUN mix, you just compile the code.

But if your code is doing {:ok, model_info} = Bumblebee.load_model({:hf, model}, even if you set BUMBLEBEE_CACHE_DIR (which is where the model will be downloaded to, otherwise it will go to a default location in your OS if it's not set), I think it will be re-downloaded if you restart the instance (it's what's happening in this very repo) or any time there's a cold boot.

ndrean commented 1 year ago

Does not seem so: https://github.com/elixir-nx/bumblebee/blob/57bdcced8d0df356093787ef38c02d76230b72a2/lib/bumblebee/huggingface/hub.ex#L52

LuchoTurtle commented 1 year ago

Hm, interesting.

This repo's fly.io page is set to suspend itself every time there's no activity in one hour. I just accessed it and it was inactive.

It re-downloaded ResNet-50. 🤔

If it's truly cached like you showed, what's the point of https://fly.io/phoenix-files/speed-up-your-boot-times-with-this-one-dockerfile-trick/#bumblebee?

ndrean commented 1 year ago

Your machine must be destroyed after one hour? Restart from zero?

Then as we said, their code doesn't work, or we don't see what is wrong

ndrean commented 1 year ago

When I use {:local, "/app/bin/.bumblebee/blip"}, I get an error "no config file found in the given repository."

However, when I inspect the running container with a docker exec -it test bash and ls, I find the bind mounted folder and its populated.

Then when I run serve with :local and an absolute hard-coded path, same error. When I use :hfwith a populated folder from a previous download, it works.

LuchoTurtle commented 1 year ago

Yeah. I have a feeling that :local is meant to be used when you directly download the model and add it to the repo manually.

LuchoTurtle commented 1 year ago

Okay, I think I figured it out.

:local is only supposed to be used when we download the model files directly to our git repo. When Bumblebee downloads the files as found in cached_download/2 (basically when calling Bumblebee.load_model/2), the files downloaded are not the same as if we'd download the model to our git repo - I'll call them hashed files.
:ht will cache downloads/hashed files in the same directory as set in BUMBLEBLEE_CACHE_DIR, as seen in cached_download/2. So we'll have an advantage if we download the model in the Dockerfile/put it in a volume and then make sure that our BUMBLEBLEE_CACHE_DIR points to it. That way, the model is fetched from this directory.

I had a lot of confusion on what the heck :local pertained to. So I hope this makes it more clear.

Now, I'm testing the container locally whilst downloading the model in the Dockerfile and now it seems to be working. Here's the Dockerfile.

# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
#   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
#   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
#   - https://pkgs.org/ - resource for finding needed packages
#   - Ex: hexpm/elixir:1.15.7-erlang-26.0.2-debian-bullseye-20231009-slim
#
ARG ELIXIR_VERSION=1.15.7
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20231009-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies (and curl for EXLA)
RUN apt-get update -y && apt-get install -y build-essential git curl \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee/"
ENV BUMBLEBEE_OFFLINE="false"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

COPY .bumblebee/ .bumblebee

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# IMPORTANT: This downloads the HuggingFace models from the `serving` function in the `lib/app/application.ex` file. 
# And copies to `.bumblebee`.
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && \
  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"

# Adding this so model can be downloaded
RUN mkdir -p /nonexistent

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/app ./
COPY --from=builder --chown=nobody:root /app/.bumblebee/ ./.bumblebee

USER nobody

# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]

# Set the runtime ENV
ENV ECTO_IPV6="true"
ENV ERL_AFLAGS="-proto_dist inet6_tcp"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee/"
ENV BUMBLEBEE_OFFLINE="true"

CMD ["/app/bin/server"]

so BUMBLEBEE_OFFLINE should only be set to true after after the image is created and the app is compiled in the Dockerfile. The reason it wasn't working before it's because BUMBLEBEE_OFFLINE was also being set during compilation. Somehow this messed up the process (though I'm not sure why).
setting BUMBLEBEE_CACHE_DIR ending with / also helped, though I doubt it since Path.join/1 is being used in cache_downloads/2.

Here's how my application.ex is looking.

defmodule App.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      # Start the Telemetry supervisor
      AppWeb.Telemetry,
      # Start the PubSub system
      {Phoenix.PubSub, name: App.PubSub},
      # Nx serving for image classifier
      {Nx.Serving, serving: serving(), name: ImageClassifier},
      # Adding a supervisor
      {Task.Supervisor, name: App.TaskSupervisor},
      # Start the Endpoint (http/https)
      AppWeb.Endpoint
      # Start a worker by calling: App.Worker.start_link(arg)
      # {App.Worker, arg}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: App.Supervisor]
    Supervisor.start_link(children, opts)
  end

  def load_models do
    # ResNet-50 -----
    {:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
    {:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
  end

  def serving do
    # ResNet-50 -----
    {:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

    Bumblebee.Vision.image_classification(model_info, featurizer,
      top_k: 1,
      compile: [batch_size: 10],
      defn_options: [compiler: EXLA],
      preallocate_params: true        # needed to run on `Fly.io`
    )

  end

  # Tell Phoenix to update the endpoint configuration
  # whenever the application is updated.
  @impl true
  def config_change(changed, _new, removed) do
    AppWeb.Endpoint.config_change(changed, removed)
    :ok
  end
end

The container now works locally without crashing and it seems to make cache_download/2 fetch from the local file in .bumblebee.

See the video below (you can skip the first 90 seconds, it's just showing docker downloading stuff).

https://github.com/dwyl/imgup/assets/17494745/3aa4ccac-d122-442c-8f83-b24a661e06e1

I've built the docker image with --no-cache on purpose. As you can see, if I restart the machine, it won't crash/manages to find the local model in .bumblebee.

The metadata_filename is the following piece of code that is found in cache_download/2:

    url = "https://huggingface.co/api/models/microsoft/resnet-50/tree/main" |> :erlang.md5() |> Base.encode32(case: :lower, padding: false)
    metadata_filename = url <> ".json"

    dbg(metadata_filename)

This yields the hashed file .json file (which is base32 encoded).

I digress. This should work for you too now 👌

UPDATE: This doesn't always work, for whatever reason. Even if the files are clearly inside the container and accessible, it errors out. I don't know how to fix this anymore. At first I thought I had to had .bumblebee populated on my localhost so Dockerfile could copy them. And that seemed to work. But now it doesn't anymore, for whatever reason.

ndrean commented 1 year ago

Yes, :local expects something else than what is downloaded.

But neither Fly.io nor livebeats whisper use exit 0 nor the COPY command that you use:

RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee

Furthermore, it may download, but as the CACHE_DIR is set and read by Docker, the docker folder should be populated and you would not need to copy things.

Still at the same point as 2 weeks ago: same mix, same Dockerfile, same serving, same Application (including the ordering) but absolutely no clue why the "official" code fails. The good part is I feel less alone :)

jeregrine commented 1 year ago

COPY .bumblebee/ .bumblebee

Will copy from your context or local machine to the current image so doing it twice won't do anything. And docker is weird and finicky I'm sorry you're having these issues, but i find full paths work better than relative ones.

If you do:

FROM ${builder}
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
FROM ${runner}
COPY --from=builder /app/.bumblebee/ /app/.bumblebee

This example is also not great because you might create very large dockerfiles which make deploys slower. One thing we've been trying out is adding a volume and downloading the model on first boot to said volume.

unless File.exists?(model) do
   App.Application.load_models()'
end

Once your app deploys can you fly ssh console into it and verify that the /app/.bumblebee files are there or not. If they are then your configuration is wrong.

LuchoTurtle commented 1 year ago

Thanks for the feedback @jeregrine .

Yes, doing that will copy from my machine to the current image and it's redundant/duplicated unnecessarily. Doing everything in the Dockerfile was actually working for a while but it stopped working after I've changed nothing. Weird stuff.

But yes, I'm aware this isn't the ideal solution - it creates gigantic image files, as you correctly stated. Having a volume is certainly the way to go and I'm currently exploring it. But I feel like my issue will still occur even with volumes. I'm testing stuff locally with Docker and I can see the model files being correctly downloaded, the env variables (BUMBLEBEE_CACHE_DIR and BUMBLEBEE_OFFLINE) are correctly set and still I get an error whilst loading the models that they are not found.

For example, in application.ex:

 def load_models do
    # ResNet-50 -----
    {:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
    {:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
  end

  def serving do

    dbg(System.get_env("BUMBLEBEE_CACHE_DIR"))
    dbg(System.get_env("BUMBLEBEE_OFFLINE"))

    # ResNet-50 -----
    {:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

    Bumblebee.Vision.image_classification(model_info, featurizer,
      top_k: 1,
      compile: [batch_size: 10],
      defn_options: [compiler: EXLA],
      preallocate_params: true        # needed to run on `Fly.io`
    )

  end

where serving/0 is used by Nx when the supervision tree is initiated.

If I run my dockerfile, I can clearly see that the files are being correctly downloaded and placed under app/.bumblebee.

So, in theory, this should totally work. But it doesn't.

2023-11-14 15:06:33 [lib/app/application.ex:49: App.Application.serving/0]
2023-11-14 15:06:33 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/"
2023-11-14 15:06:33 
2023-11-14 15:06:33 [lib/app/application.ex:50: App.Application.serving/0]
2023-11-14 15:06:33 System.get_env("BUMBLEBEE_OFFLINE") #=> "true"
2023-11-14 15:06:33 
2023-11-14 15:06:33 15:06:33.255 [info] TfrtCpuClient created.
2023-11-14 15:06:33 15:06:33.645 [notice] Application app exited: exited in: App.Application.start(:normal, [])
2023-11-14 15:06:33     ** (EXIT) an exception was raised:
2023-11-14 15:06:33         ** (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/microsoft/resnet-50/resolve/main/preprocessor_config.json"}
2023-11-14 15:06:33             (app 0.1.0) lib/app/application.ex:54: App.Application.serving/0
2023-11-14 15:06:33             (app 0.1.0) lib/app/application.ex:16: App.Application.start/2

Regardless if the model is stored in a volume or not (I know that there are ephemeral storage considerations on fly.io), I'm doing this on my computer and on a Docker instance. I'm at a loss at what I could be doing wrong 😅

jeregrine commented 1 year ago

Could you do a File.ls("/app/.bumblebee/") |> IO.inspect and see what you get?

On Tue, Nov 14, 2023 at 9:07 AM LuchoTurtle @.***> wrote:

Thanks for the feedback @jeregrine https://github.com/jeregrine .

Yes, doing that will copy from my machine to the current image. It was actually working for a while but it stopped working after I've changed nothing. Weird stuff.

But yes, I'm aware this isn't the ideal solution - it creates gigantic image files, as you correctly stated. Having a volume is certainly the way to go and I'm currently exploring it. But I feel like my issue will still occur even with volumes. I'm testing stuff locally with Docker and I can see the model files being correctly downloaded, the env variables ( BUMBLEBEE_CACHE_DIR and BUMBLEBEE_OFFLINE) are correctly set and still I get an error whilst loading the models that they are not found.

For example, in application.ex:

def load_models do

ResNet-50 -----
{:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
end

def serving do
dbg(System.get_env("BUMBLEBEE_CACHE_DIR"))
dbg(System.get_env("BUMBLEBEE_OFFLINE"))

# ResNet-50 -----
{:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

Bumblebee.Vision.image_classification(model_info, featurizer,
  top_k: 1,
  compile: [batch_size: 10],
  defn_options: [compiler: EXLA],
  preallocate_params: true        # needed to run on `Fly.io`
)
end

where serving/0 is used by Nx when the supervision tree is initiated.

If I run my dockerfile, I can clearly see that the files are being correctly downloaded and placed under app/.bumblebee. [image: image] https://user-images.githubusercontent.com/17494745/282822442-5691f03d-6d9b-4168-9d73-841b0dc4ed1b.png

So, in theory, this should totally work. But it doesn't.

2023-11-14 15:06:33 [lib/app/application.ex:49: App.Application.serving/0] 2023-11-14 15:06:33 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/" 2023-11-14 15:06:33 2023-11-14 15:06:33 [lib/app/application.ex:50: App.Application.serving/0] 2023-11-14 15:06:33 System.get_env("BUMBLEBEE_OFFLINE") #=> "true" 2023-11-14 15:06:33 2023-11-14 15:06:33 15:06:33.255 [info] TfrtCpuClient created. 2023-11-14 15:06:33 15:06:33.645 [notice] Application app exited: exited in: App.Application.start(:normal, []) 2023-11-14 15:06:33 (EXIT) an exception was raised: 2023-11-14 15:06:33 (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/microsoft/resnet-50/resolve/main/preprocessor_config.json"} 2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:54: App.Application.serving/0 2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:16: App.Application.start/2

Regardless if the model is stored in a volume or not (I know that there are ephemeral storage considerations), I'm doing this on my computer and on a Docker instance. I'm at a loss at what I could be doing wrong 😅

— Reply to this email directly, view it on GitHub https://github.com/dwyl/imgup/issues/131#issuecomment-1810415608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYUFQOVEZWROGWWJYYUK3YEOCLXAVCNFSM6AAAAAA5V6UBZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGQYTKNRQHA . You are receiving this because you were mentioned.Message ID: @.***>

ndrean commented 1 year ago

This is exactly the pb I had (and still have) when I download within the Dockerfile.

If I run the image with a bash command, and ls, this is what I see (and I can't run the image due to the error above)

If I don't download but mount bind a volume containing this folder, the folder is fully populated and the image runs:

 docker run --rm -it -p 4000:4000 --mount src=sf,target=/app/.bumblebee/ --env-file .env-docker --name app-cont up-img

# other terminal
docker exec -it app-cont bash

LuchoTurtle commented 1 year ago

@ndrean So, assuming you've created a volume called models on fly.io and you have this in your fly.toml.

[mounts]
  source = "models"
  destination = "/app/.bumblebee/"

At what stage do you download the models? Do you do it yourself manually? Do you run an external script that does this?

ndrean commented 1 year ago

This is my question! If you create a volume, can you ssh into it, even if no app is running (considering we are in the same region), or the simple "mounts" in the fly.toml will populate it?

LuchoTurtle commented 1 year ago

Could you do a File.ls("/app/.bumblebee/") |> IO.inspect and see what you get?

I get the following:

2023-11-14 16:14:43 [lib/app/application.ex:49: App.Application.serving/0]
2023-11-14 16:14:43 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/"
2023-11-14 16:14:43 
2023-11-14 16:14:43 [lib/app/application.ex:50: App.Application.serving/0]
2023-11-14 16:14:43 System.get_env("BUMBLEBEE_OFFLINE") #=> "true"
2023-11-14 16:14:43 
2023-11-14 16:14:43 {:ok, ["huggingface"]}
2023-11-14 16:14:43 [lib/app/application.ex:51: App.Application.serving/0]
2023-11-14 16:14:43 File.ls("/app/.bumblebee/") #=> {:ok, ["huggingface"]}
2023-11-14 16:14:43 |> IO.inspect() #=> {:ok, ["huggingface"]}
2023-11-14 16:14:43 
2023-11-14 16:14:43 16:14:43.822 [info] TfrtCpuClient created.
2023-11-14 16:14:44 16:14:44.224 [notice] Application app exited: exited in: App.Application.start(:normal, [])
2023-11-14 16:14:44     ** (EXIT) an exception was raised:
2023-11-14 16:14:44         ** (MatchError) no match of right hand side value: {:error, "could not find file in local ca

ndrean commented 1 year ago

@LuchoTurtle , File.ls("/app/bumblebee/huggingface") and compare to your data because I had a difference

ndrean commented 1 year ago

My image is 580Mb though so I can't test it on a free machine.

LuchoTurtle commented 1 year ago

I've added the following code to mimic cached_download/2 and, as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface.


    url = "https://huggingface.co/api/models/microsoft/resnet-50/tree/main" |> :erlang.md5() |> Base.encode32(case: :lower, padding: false)
    metadata_filename = url <> ".json"
    dbg(metadata_filename)
    dbg(File.ls("/app/.bumblebee/huggingface") |> IO.inspect)

On startup, it yields...

2023-11-14T16:27:57.716 app[683d529c575228] mad [info] metadata_filename #=> "7p34k3zbgum6n3sspclx3dv3aq.json"

2023-11-14T16:27:57.717 app[683d529c575228] mad [info] {:ok,

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] ["45jmafnchxcbm43dsoretzry4i.json",

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "7p34k3zbgum6n3sspclx3dv3aq.k4xsenbtguwtmuclmfdgum3enjuwosljkrbuc42govrhcudqlbde6ujc",

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "45jmafnchxcbm43dsoretzry4i.eiztamryhfrtsnzzgjstmnrymq3tgyzzheytqmrzmm4dqnbshe3tozjsmi4tanjthera",

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "7p34k3zbgum6n3sspclx3dv3aq.json",

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "6scgvbvxgc6kagvthh26fzl53a.ejtgmobrgyzwcmjtgiztgmztgezdmnzqgzsdmnbzmnstom3fmnsdontfgq2wimrugfrdimtegyzdgzdfme3ggnzsgm3dsmddmftgkmbxei",

2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "6scgvbvxgc6kagvthh26fzl53a.json"]}

LuchoTurtle commented 1 year ago

This is my question! If you create a volume, can you ssh into it, even if no app is running (considering we are in the same region), or the simple "mounts" in the fly.toml will populate it?

I don't think you can ssh into it without the app successfully running. I've tried and it kicks me out every time it attempts to restart the server (expected). As far as I can tell, [mounts] won't populate anything, it's just pointing where we want to the model to be. Though I don't quite understand how I'm meant to populate the volume in an automated manner lmao.

ndrean commented 1 year ago

I think you can just reference the volume by its name.

This is why I decided to try a VPS

ndrean commented 1 year ago

as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface.

Does it matches all the files you have locally for this model? For me, no

jeregrine commented 1 year ago

My suggestion is to skip the dockerfile stuff. Do this

in your application startlink check iof the models exist, if not download them to a volume. And it will be slow to boot once and you'll never think about it again.

On Tue, Nov 14, 2023 at 11:36 AM Neven DREAN @.***> wrote:

as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface.

Does it matches all the files you have locally for this model? For me, no

— Reply to this email directly, view it on GitHub https://github.com/dwyl/imgup/issues/131#issuecomment-1810774696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYUFRMRA5BLQX5IFJE2ZTYEOT2NAVCNFSM6AAAAAA5V6UBZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQG43TINRZGY . You are receiving this because you were mentioned.Message ID: @.***>

LuchoTurtle commented 1 year ago

Yeah, there's no use putting more effort into a dead-end. So I'm downloading the model on the first boot up and then reusing it on subsequent restarts.

  def start(_type, _args) do

        # Checking if the models have been downloaded
        models_folder_path = Path.join(System.get_env("BUMBLEBEE_CACHE_DIR"), "huggingface")
        if not File.exists?(models_folder_path) or File.ls!(models_folder_path) == [] do
          load_models()
        end

        children = [
        ...
    end

  def load_models do
    # ResNet-50 -----
    {:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
    {:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
  end

  def serving do
    # ResNet-50 -----
    {:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50", offline: true})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50", offline: true})

    Bumblebee.Vision.image_classification(model_info, featurizer,
      top_k: 1,
      compile: [batch_size: 10],
      defn_options: [compiler: EXLA],
      preallocate_params: true        # needed to run on `Fly.io`
    )

  end

Instead of using BUMBLEBEE_OFFLINE, I'm passing the :offline option when loading the models, so they're fetched locally.

LuchoTurtle commented 1 year ago

as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface.

Does it matches all the files you have locally for this model? For me, no

The filename from cached_downloads/2 matches one of them (as expected). Though I see two files are missing when compared to the local one 💭

Regardless, no use to keep trying with the Dockerfile-only approach anymore 😅

ndrean commented 1 year ago

So like me, some files are missing. Also the versions of Bumblebee and Nx are quite unstable. I decided to fork your repo to run the same code as you did, but guess what, I can't even run mix ph.server because:

function EXLA.NIF.start_log_sink/1 is undefined (module EXLA.NIF is not available)

Starting to lose a bit my patience. But maybe more important, the documentation isn't reliable? Take a look: https://github.com/elixir-nx/bumblebee/tree/main/examples/phoenix#tips. What is really working?

LuchoTurtle commented 1 year ago

So like me, some files are missing. Also the versions of Bumblebee and Nx are quite unstable. I decided to fork your repo to run the same code as you did, but guess what, I can't even run mix ph.server because:
function EXLA.NIF.start_log_sink/1 is undefined (module EXLA.NIF is not available)
Starting to lose a bit my patience. But maybe more important, the documentation isn't reliable? Take a look: elixir-nx/bumblebee@main/examples/phoenix#tips. What is really working?

That's odd, I've never had that error happen to me. Does clearing out the deps and running mix deps.get again fix it? https://elixirforum.com/t/exla-nif-start-log-sink-1-issue-works-on-ubuntu-but-not-on-macbook-m2/58162/3 says it's a Linux-related issue, but it apparently has been solved.

From the link that you provided, from what I've gathered, I did follow the Configuring Nx chapter and it seems to work the same as before. I can't really quantify it because I saw negligible impact after configuring it like so 🤷‍♂️

It's a shame we can't really trust their docs when clearly some of the articles and guides we've discussed and tried to follow clearly don't work :/

ndrean commented 1 year ago

Yes, I erased everything, mix.lock, deps, _build and restarted again. It works now.....

ndrean commented 1 year ago

Then for with or without EXLA, yes, there is a huge difference. I did try a few months ago a very simple neural network with Axon, just to compute a linear regression by a simple gradient, and the difference was HUGE. Now, with what we are doing, no idea 😁

ndrean commented 1 year ago

I understand (?) that we are more of less loading the coefficients of some kind of process that is used to define some Axon operations to build the neural network defined by the model. But what are we doing exactly, that I have absolutely no idea. But for sure, it is not rocket science 😁

ndrean commented 1 year ago

I recalled I made a Livebook last year to test Nx and Axon. The idea was to use a very simple example: linear regression. You start with the very well known matrix formulas to compute the exact solution: whether by inverting a matrix, or using formulas. The best fitting linear curve passing through a bunch of points is of course the linear curve which gives the minimal total euclidean dsitance between the curve and the pionts. You compare this to a smiple gradient descent, and since e are crazy, we can even use a NN! You "pompously" train your NN with your points, to build the coefficients of your NN. Then you can use it: given an input x, it finds an y. If you are interested just to see what is this about, I can paste somewhere the Livebook

LuchoTurtle commented 1 year ago

Do show!

For my first NN I did something similar with Stochastic Gradient Descent -> https://github.com/LuchoTurtle/bike-sharing-patterns/blob/master/Your_first_neural_network.ipynb.

Curious to see what you've done :)

nelsonic commented 1 year ago

@LuchoTurtle FYI: that link is 404 ... 🔗 🙈 It it public? 🌎

LuchoTurtle commented 1 year ago

Should be now, thanks 👌

ndrean commented 1 year ago

Here: https://github.com/ndrean/linear_regression_nx_axon But this is terribly super basic work....

dwyl / imgup

Testing Image-To-Text #131

ResNet-50 -----