Open ndrean opened 1 year ago
Right. I understand you're not running any RUN mix
, you just compile the code.
But if your code is doing {:ok, model_info} = Bumblebee.load_model({:hf, model}
, even if you set BUMBLEBEE_CACHE_DIR
(which is where the model will be downloaded to, otherwise it will go to a default location in your OS if it's not set), I think it will be re-downloaded if you restart the instance (it's what's happening in this very repo) or any time there's a cold boot.
Hm, interesting.
This repo's fly.io
page is set to suspend itself every time there's no activity in one hour. I just accessed it and it was inactive.
It re-downloaded ResNet-50
. π€
If it's truly cached like you showed, what's the point of https://fly.io/phoenix-files/speed-up-your-boot-times-with-this-one-dockerfile-trick/#bumblebee?
Your machine must be destroyed after one hour? Restart from zero?
Then as we said, their code doesn't work, or we don't see what is wrong
When I use {:local, "/app/bin/.bumblebee/blip"}
, I get an error "no config file found in the given repository."
However, when I inspect the running container with a docker exec -it test bash
and ls
, I find the bind mounted folder and its populated.
Then when I run serve with :local
and an absolute hard-coded path, same error. When I use :hf
with a populated folder from a previous download, it works.
Yeah. I have a feeling that :local
is meant to be used when you directly download the model and add it to the repo manually.
Okay, I think I figured it out.
:local
is only supposed to be used when we download the model files directly to our git repo.
When Bumblebee downloads the files as found in cached_download/2
(basically when calling Bumblebee.load_model/2
), the files downloaded are not the same as if we'd download the model to our git repo - I'll call them hashed files
.
:ht
will cache downloads/hashed files
in the same directory as set in BUMBLEBLEE_CACHE_DIR
, as seen in cached_download/2
. So we'll have an advantage if we download the model in the Dockerfile
/put it in a volume and then make sure that our BUMBLEBLEE_CACHE_DIR
points to it. That way, the model is fetched from this directory.
I had a lot of confusion on what the heck :local
pertained to. So I hope this makes it more clear.
Now, I'm testing the container locally whilst downloading the model in the Dockerfile
and now it seems to be working.
Here's the Dockerfile
.
# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
# - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
# - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
# - https://pkgs.org/ - resource for finding needed packages
# - Ex: hexpm/elixir:1.15.7-erlang-26.0.2-debian-bullseye-20231009-slim
#
ARG ELIXIR_VERSION=1.15.7
ARG OTP_VERSION=26.0.2
ARG DEBIAN_VERSION=bullseye-20231009-slim
ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"
FROM ${BUILDER_IMAGE} as builder
# install build dependencies (and curl for EXLA)
RUN apt-get update -y && apt-get install -y build-essential git curl \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# prepare build dir
WORKDIR /app
# install hex + rebar
RUN mix local.hex --force && \
mix local.rebar --force
# set build ENV
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee/"
ENV BUMBLEBEE_OFFLINE="false"
# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config
# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile
COPY priv priv
COPY lib lib
COPY assets assets
COPY .bumblebee/ .bumblebee
# compile assets
RUN mix assets.deploy
# Compile the release
RUN mix compile
# IMPORTANT: This downloads the HuggingFace models from the `serving` function in the `lib/app/application.ex` file.
# And copies to `.bumblebee`.
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee
# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/
COPY rel rel
RUN mix release
# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}
RUN apt-get update -y && \
apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
WORKDIR "/app"
RUN chown nobody /app
# set runner ENV
ENV MIX_ENV="prod"
# Adding this so model can be downloaded
RUN mkdir -p /nonexistent
# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/app ./
COPY --from=builder --chown=nobody:root /app/.bumblebee/ ./.bumblebee
USER nobody
# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]
# Set the runtime ENV
ENV ECTO_IPV6="true"
ENV ERL_AFLAGS="-proto_dist inet6_tcp"
ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee/"
ENV BUMBLEBEE_OFFLINE="true"
CMD ["/app/bin/server"]
BUMBLEBEE_OFFLINE
should only be set to true
after after the image is created and the app is compiled in the Dockerfile
. The reason it wasn't working before it's because BUMBLEBEE_OFFLINE
was also being set during compilation. Somehow this messed up the process (though I'm not sure why).BUMBLEBEE_CACHE_DIR
ending with /
also helped, though I doubt it since Path.join/1
is being used in cache_downloads/2
.Here's how my application.ex
is looking.
defmodule App.Application do
# See https://hexdocs.pm/elixir/Application.html
# for more information on OTP Applications
@moduledoc false
use Application
@impl true
def start(_type, _args) do
children = [
# Start the Telemetry supervisor
AppWeb.Telemetry,
# Start the PubSub system
{Phoenix.PubSub, name: App.PubSub},
# Nx serving for image classifier
{Nx.Serving, serving: serving(), name: ImageClassifier},
# Adding a supervisor
{Task.Supervisor, name: App.TaskSupervisor},
# Start the Endpoint (http/https)
AppWeb.Endpoint
# Start a worker by calling: App.Worker.start_link(arg)
# {App.Worker, arg}
]
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: App.Supervisor]
Supervisor.start_link(children, opts)
end
def load_models do
# ResNet-50 -----
{:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
end
def serving do
# ResNet-50 -----
{:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
Bumblebee.Vision.image_classification(model_info, featurizer,
top_k: 1,
compile: [batch_size: 10],
defn_options: [compiler: EXLA],
preallocate_params: true # needed to run on `Fly.io`
)
end
# Tell Phoenix to update the endpoint configuration
# whenever the application is updated.
@impl true
def config_change(changed, _new, removed) do
AppWeb.Endpoint.config_change(changed, removed)
:ok
end
end
The container now works locally without crashing and it seems to make cache_download/2
fetch from the local file in .bumblebee
.
See the video below (you can skip the first 90 seconds, it's just showing docker
downloading stuff).
https://github.com/dwyl/imgup/assets/17494745/3aa4ccac-d122-442c-8f83-b24a661e06e1
I've built the docker image with --no-cache
on purpose. As you can see, if I restart the machine, it won't crash/manages to find the local model in .bumblebee
.
The metadata_filename
is the following piece of code that is found in cache_download/2
:
url = "https://huggingface.co/api/models/microsoft/resnet-50/tree/main" |> :erlang.md5() |> Base.encode32(case: :lower, padding: false)
metadata_filename = url <> ".json"
dbg(metadata_filename)
This yields the hashed file
.json
file (which is base32
encoded).
I digress. This should work for you too now π
UPDATE: This doesn't always work, for whatever reason. Even if the files are clearly inside the container and accessible, it errors out. I don't know how to fix this anymore.
At first I thought I had to had .bumblebee
populated on my localhost
so Dockerfile
could copy them. And that seemed to work. But now it doesn't anymore, for whatever reason.
Yes, :local
expects something else than what is downloaded.
But neither Fly.io nor livebeats whisper use exit 0
nor the COPY
command that you use:
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
COPY .bumblebee/ .bumblebee
Furthermore, it may download, but as the CACHE_DIR is set and read by Docker, the docker folder should be populated and you would not need to copy things.
Still at the same point as 2 weeks ago: same mix
, same Dockerfile
, same serving
, same Application
(including the ordering) but absolutely no clue why the "official" code fails. The good part is I feel less alone :)
COPY .bumblebee/ .bumblebee
Will copy from your context or local machine to the current image so doing it twice won't do anything. And docker is weird and finicky I'm sorry you're having these issues, but i find full paths work better than relative ones.
If you do:
FROM ${builder}
RUN mix run -e 'App.Application.load_models()' --no-start --no-halt; exit 0
FROM ${runner}
COPY --from=builder /app/.bumblebee/ /app/.bumblebee
This example is also not great because you might create very large dockerfiles which make deploys slower. One thing we've been trying out is adding a volume and downloading the model on first boot to said volume.
unless File.exists?(model) do
App.Application.load_models()'
end
Once your app deploys can you fly ssh console
into it and verify that the /app/.bumblebee
files are there or not. If they are then your configuration is wrong.
Thanks for the feedback @jeregrine .
Yes, doing that will copy from my machine to the current image and it's redundant/duplicated unnecessarily.
Doing everything in the Dockerfile
was actually working for a while but it stopped working after I've changed nothing. Weird stuff.
But yes, I'm aware this isn't the ideal solution - it creates gigantic image files, as you correctly stated. Having a volume is certainly the way to go and I'm currently exploring it. But I feel like my issue will still occur even with volumes. I'm testing stuff locally with Docker
and I can see the model files being correctly downloaded, the env variables (BUMBLEBEE_CACHE_DIR
and BUMBLEBEE_OFFLINE
) are correctly set and still I get an error whilst loading the models that they are not found.
For example, in application.ex
:
def load_models do
# ResNet-50 -----
{:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
end
def serving do
dbg(System.get_env("BUMBLEBEE_CACHE_DIR"))
dbg(System.get_env("BUMBLEBEE_OFFLINE"))
# ResNet-50 -----
{:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
Bumblebee.Vision.image_classification(model_info, featurizer,
top_k: 1,
compile: [batch_size: 10],
defn_options: [compiler: EXLA],
preallocate_params: true # needed to run on `Fly.io`
)
end
where serving/0
is used by Nx
when the supervision tree is initiated.
If I run my dockerfile
, I can clearly see that the files are being correctly downloaded and placed under app/.bumblebee
.
So, in theory, this should totally work. But it doesn't.
2023-11-14 15:06:33 [lib/app/application.ex:49: App.Application.serving/0]
2023-11-14 15:06:33 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/"
2023-11-14 15:06:33
2023-11-14 15:06:33 [lib/app/application.ex:50: App.Application.serving/0]
2023-11-14 15:06:33 System.get_env("BUMBLEBEE_OFFLINE") #=> "true"
2023-11-14 15:06:33
2023-11-14 15:06:33 15:06:33.255 [info] TfrtCpuClient created.
2023-11-14 15:06:33 15:06:33.645 [notice] Application app exited: exited in: App.Application.start(:normal, [])
2023-11-14 15:06:33 ** (EXIT) an exception was raised:
2023-11-14 15:06:33 ** (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/microsoft/resnet-50/resolve/main/preprocessor_config.json"}
2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:54: App.Application.serving/0
2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:16: App.Application.start/2
Regardless if the model is stored in a volume or not (I know that there are ephemeral storage considerations on fly.io
), I'm doing this on my computer and on a Docker
instance. I'm at a loss at what I could be doing wrong π
Could you do a File.ls("/app/.bumblebee/") |> IO.inspect and see what you get?
On Tue, Nov 14, 2023 at 9:07β―AM LuchoTurtle @.***> wrote:
Thanks for the feedback @jeregrine https://github.com/jeregrine .
Yes, doing that will copy from my machine to the current image. It was actually working for a while but it stopped working after I've changed nothing. Weird stuff.
But yes, I'm aware this isn't the ideal solution - it creates gigantic image files, as you correctly stated. Having a volume is certainly the way to go and I'm currently exploring it. But I feel like my issue will still occur even with volumes. I'm testing stuff locally with Docker and I can see the model files being correctly downloaded, the env variables ( BUMBLEBEE_CACHE_DIR and BUMBLEBEE_OFFLINE) are correctly set and still I get an error whilst loading the models that they are not found.
For example, in application.ex:
def load_models do
ResNet-50 -----
{:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}) {:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
end
def serving do
dbg(System.get_env("BUMBLEBEE_CACHE_DIR")) dbg(System.get_env("BUMBLEBEE_OFFLINE")) # ResNet-50 ----- {:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}) {:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"}) Bumblebee.Vision.image_classification(model_info, featurizer, top_k: 1, compile: [batch_size: 10], defn_options: [compiler: EXLA], preallocate_params: true # needed to run on `Fly.io` )
end
where serving/0 is used by Nx when the supervision tree is initiated.
If I run my dockerfile, I can clearly see that the files are being correctly downloaded and placed under app/.bumblebee. [image: image] https://user-images.githubusercontent.com/17494745/282822442-5691f03d-6d9b-4168-9d73-841b0dc4ed1b.png
So, in theory, this should totally work. But it doesn't.
2023-11-14 15:06:33 [lib/app/application.ex:49: App.Application.serving/0] 2023-11-14 15:06:33 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/" 2023-11-14 15:06:33 2023-11-14 15:06:33 [lib/app/application.ex:50: App.Application.serving/0] 2023-11-14 15:06:33 System.get_env("BUMBLEBEE_OFFLINE") #=> "true" 2023-11-14 15:06:33 2023-11-14 15:06:33 15:06:33.255 [info] TfrtCpuClient created. 2023-11-14 15:06:33 15:06:33.645 [notice] Application app exited: exited in: App.Application.start(:normal, []) 2023-11-14 15:06:33 (EXIT) an exception was raised: 2023-11-14 15:06:33 (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/microsoft/resnet-50/resolve/main/preprocessor_config.json"} 2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:54: App.Application.serving/0 2023-11-14 15:06:33 (app 0.1.0) lib/app/application.ex:16: App.Application.start/2
Regardless if the model is stored in a volume or not (I know that there are ephemeral storage considerations), I'm doing this on my computer and on a Docker instance. I'm at a loss at what I could be doing wrong π
β Reply to this email directly, view it on GitHub https://github.com/dwyl/imgup/issues/131#issuecomment-1810415608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYUFQOVEZWROGWWJYYUK3YEOCLXAVCNFSM6AAAAAA5V6UBZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGQYTKNRQHA . You are receiving this because you were mentioned.Message ID: @.***>
This is exactly the pb I had (and still have) when I download within the Dockerfile.
If I run the image with a bash command, and ls
, this is what I see (and I can't run the image due to the error above)
If I don't download but mount bind a volume containing this folder, the folder is fully populated and the image runs:
docker run --rm -it -p 4000:4000 --mount src=sf,target=/app/.bumblebee/ --env-file .env-docker --name app-cont up-img
# other terminal
docker exec -it app-cont bash
@ndrean
So, assuming you've created a volume called models
on fly.io
and you have this in your fly.toml
.
[mounts]
source = "models"
destination = "/app/.bumblebee/"
At what stage do you download the models? Do you do it yourself manually? Do you run an external script that does this?
This is my question! If you create a volume, can you ssh into it, even if no app is running (considering we are in the same region), or the simple "mounts" in the fly.toml will populate it?
Could you do a File.ls("/app/.bumblebee/") |> IO.inspect and see what you get?
I get the following:
2023-11-14 16:14:43 [lib/app/application.ex:49: App.Application.serving/0]
2023-11-14 16:14:43 System.get_env("BUMBLEBEE_CACHE_DIR") #=> "/app/.bumblebee/"
2023-11-14 16:14:43
2023-11-14 16:14:43 [lib/app/application.ex:50: App.Application.serving/0]
2023-11-14 16:14:43 System.get_env("BUMBLEBEE_OFFLINE") #=> "true"
2023-11-14 16:14:43
2023-11-14 16:14:43 {:ok, ["huggingface"]}
2023-11-14 16:14:43 [lib/app/application.ex:51: App.Application.serving/0]
2023-11-14 16:14:43 File.ls("/app/.bumblebee/") #=> {:ok, ["huggingface"]}
2023-11-14 16:14:43 |> IO.inspect() #=> {:ok, ["huggingface"]}
2023-11-14 16:14:43
2023-11-14 16:14:43 16:14:43.822 [info] TfrtCpuClient created.
2023-11-14 16:14:44 16:14:44.224 [notice] Application app exited: exited in: App.Application.start(:normal, [])
2023-11-14 16:14:44 ** (EXIT) an exception was raised:
2023-11-14 16:14:44 ** (MatchError) no match of right hand side value: {:error, "could not find file in local ca
@LuchoTurtle , File.ls("/app/bumblebee/huggingface")
and compare to your data because I had a difference
My image is 580Mb though so I can't test it on a free machine.
I've added the following code to mimic cached_download/2
and, as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface
.
url = "https://huggingface.co/api/models/microsoft/resnet-50/tree/main" |> :erlang.md5() |> Base.encode32(case: :lower, padding: false)
metadata_filename = url <> ".json"
dbg(metadata_filename)
dbg(File.ls("/app/.bumblebee/huggingface") |> IO.inspect)
On startup, it yields...
2023-11-14T16:27:57.716 app[683d529c575228] mad [info] metadata_filename #=> "7p34k3zbgum6n3sspclx3dv3aq.json"
2023-11-14T16:27:57.717 app[683d529c575228] mad [info] {:ok,
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] ["45jmafnchxcbm43dsoretzry4i.json",
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "7p34k3zbgum6n3sspclx3dv3aq.k4xsenbtguwtmuclmfdgum3enjuwosljkrbuc42govrhcudqlbde6ujc",
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "45jmafnchxcbm43dsoretzry4i.eiztamryhfrtsnzzgjstmnrymq3tgyzzheytqmrzmm4dqnbshe3tozjsmi4tanjthera",
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "7p34k3zbgum6n3sspclx3dv3aq.json",
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "6scgvbvxgc6kagvthh26fzl53a.ejtgmobrgyzwcmjtgiztgmztgezdmnzqgzsdmnbzmnstom3fmnsdontfgq2wimrugfrdimtegyzdgzdfme3ggnzsgm3dsmddmftgkmbxei",
2023-11-14T16:27:57.718 app[683d529c575228] mad [info] "6scgvbvxgc6kagvthh26fzl53a.json"]}
This is my question! If you create a volume, can you ssh into it, even if no app is running (considering we are in the same region), or the simple "mounts" in the fly.toml will populate it?
I don't think you can ssh
into it without the app successfully running. I've tried and it kicks me out every time it attempts to restart the server (expected). As far as I can tell, [mounts]
won't populate anything, it's just pointing where we want to the model to be. Though I don't quite understand how I'm meant to populate the volume in an automated manner lmao.
I think you can just reference the volume by its name.
This is why I decided to try a VPS
as you can see, the filename matches and can be found inside
/app/.bumblebee/huggingface
.
Does it matches all the files you have locally for this model? For me, no
My suggestion is to skip the dockerfile stuff. Do this
in your application startlink check iof the models exist, if not download them to a volume. And it will be slow to boot once and you'll never think about it again.
On Tue, Nov 14, 2023 at 11:36β―AM Neven DREAN @.***> wrote:
as you can see, the filename matches and can be found inside /app/.bumblebee/huggingface.
Does it matches all the files you have locally for this model? For me, no
β Reply to this email directly, view it on GitHub https://github.com/dwyl/imgup/issues/131#issuecomment-1810774696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYUFRMRA5BLQX5IFJE2ZTYEOT2NAVCNFSM6AAAAAA5V6UBZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQG43TINRZGY . You are receiving this because you were mentioned.Message ID: @.***>
Yeah, there's no use putting more effort into a dead-end. So I'm downloading the model on the first boot up and then reusing it on subsequent restarts.
def start(_type, _args) do
# Checking if the models have been downloaded
models_folder_path = Path.join(System.get_env("BUMBLEBEE_CACHE_DIR"), "huggingface")
if not File.exists?(models_folder_path) or File.ls!(models_folder_path) == [] do
load_models()
end
children = [
...
end
def load_models do
# ResNet-50 -----
{:ok, _} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, _} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
end
def serving do
# ResNet-50 -----
{:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50", offline: true})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50", offline: true})
Bumblebee.Vision.image_classification(model_info, featurizer,
top_k: 1,
compile: [batch_size: 10],
defn_options: [compiler: EXLA],
preallocate_params: true # needed to run on `Fly.io`
)
end
Instead of using BUMBLEBEE_OFFLINE
, I'm passing the :offline
option when loading the models, so they're fetched locally.
as you can see, the filename matches and can be found inside
/app/.bumblebee/huggingface
.Does it matches all the files you have locally for this model? For me, no
The filename from cached_downloads/2
matches one of them (as expected). Though I see two files are missing when compared to the local one π
Regardless, no use to keep trying with the Dockerfile
-only approach anymore π
So like me, some files are missing. Also the versions of Bumblebee and Nx are quite unstable. I decided to fork your repo to run the same code as you did, but guess what, I can't even run mix ph.server
because:
function EXLA.NIF.start_log_sink/1 is undefined (module EXLA.NIF is not available)
Starting to lose a bit my patience. But maybe more important, the documentation isn't reliable? Take a look: https://github.com/elixir-nx/bumblebee/tree/main/examples/phoenix#tips. What is really working?
So like me, some files are missing. Also the versions of Bumblebee and Nx are quite unstable. I decided to fork your repo to run the same code as you did, but guess what, I can't even run
mix ph.server
because:function EXLA.NIF.start_log_sink/1 is undefined (module EXLA.NIF is not available)
Starting to lose a bit my patience. But maybe more important, the documentation isn't reliable? Take a look: elixir-nx/bumblebee@
main
/examples/phoenix#tips. What is really working?
That's odd, I've never had that error happen to me. Does clearing out the deps and running mix deps.get
again fix it?
https://elixirforum.com/t/exla-nif-start-log-sink-1-issue-works-on-ubuntu-but-not-on-macbook-m2/58162/3 says it's a Linux-related issue, but it apparently has been solved.
From the link that you provided, from what I've gathered, I did follow the Configuring
Nx chapter and it seems to work the same as before. I can't really quantify it because I saw negligible impact after configuring it like so π€·ββοΈ
It's a shame we can't really trust their docs when clearly some of the articles and guides we've discussed and tried to follow clearly don't work :/
Yes, I erased everything, mix.lock, deps, _build and restarted again. It works now.....
Then for with or without EXLA, yes, there is a huge difference. I did try a few months ago a very simple neural network with Axon, just to compute a linear regression by a simple gradient, and the difference was HUGE. Now, with what we are doing, no idea π
I understand (?) that we are more of less loading the coefficients of some kind of process that is used to define some Axon operations to build the neural network defined by the model. But what are we doing exactly, that I have absolutely no idea. But for sure, it is not rocket science π
I recalled I made a Livebook last year to test Nx and Axon. The idea was to use a very simple example: linear regression. You start with the very well known matrix formulas to compute the exact solution: whether by inverting a matrix, or using formulas. The best fitting linear curve passing through a bunch of points is of course the linear curve which gives the minimal total euclidean dsitance between the curve and the pionts. You compare this to a smiple gradient descent, and since e are crazy, we can even use a NN! You "pompously" train your NN with your points, to build the coefficients of your NN. Then you can use it: given an input x, it finds an y. If you are interested just to see what is this about, I can paste somewhere the Livebook
Do show!
For my first NN I did something similar with Stochastic Gradient Descent -> https://github.com/LuchoTurtle/bike-sharing-patterns/blob/master/Your_first_neural_network.ipynb.
Curious to see what you've done :)
@LuchoTurtle FYI: that link is 404
... π π
It it public
? π
Should be now, thanks π
Here: https://github.com/ndrean/linear_regression_nx_axon But this is terribly super basic work....
I gave Bumblebee a try today. The idea was to provide predictions on image captioning to classify an image so that a user can use/put pre-filled tags to easily filter his images.
It turns out that the predictions are.....not too bad and quite fast., at least locally.
This is supposed to be a car:
https://dwyl-imgup.s3.eu-west-3.amazonaws.com/40F36F45.webp
Testing with a new query string:
pred=on
to run the model prediction:I tested 3 models: "facebook/deit-base-distilled-patch16-224" and "microsoft/resnet-50" and ""google/vit-base-patch16-224".
I don't know if anyone tested it?
I submit my code in case any reader sees some obvious fault. It runs locally. It is based on this example. I did not try to deploy this, but here is a guide before I forget: you need to set up a temp dir.
I decided to run a GenServer to start the
serving
with the app to load the model, but you can start anNx.Serving
in the Aplpication level as well, something like{Nx.serving, serving: serve(), name: UpImg.Serving}
where the functionApplication.serve
defines what is in the GenServer below.and it is started with the app:
The model - the repo id - is passed as an
env var
so I can very simply change it..In the API, I use
predict/1
when I upload an image from the browser and run this task in parallel to the S3 upload. It takes aVix.Vips.Image
, a transformation of a binary file:[EDITED]
and use it in the flow: