Just download a model from HuggingFace?

elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)

Apache License 2.0

1.26k stars 90 forks source link

Just download a model from HuggingFace? #363

Closed lawik closed 3 months ago

lawik commented 3 months ago

Hey

I was looking to set up something where I am loading models from a nearby S3 bucket, or even using an S3 as a pass-through cache for models. And I realized there are no public functions for triggering just the download so when I try to download it I also have to load it which in some cases takes more time than the download and is entirely unnecessary for the purposes of then putting it somewhere else :D

I can use private APIs to make some progress but I'm essentially re-implementing stuff already there.

If most of load_model could be broken out to be available as a download_model. Probably same with other load_X things. Then it would be fairly easy to add options for people who don't want to hassle HuggingFace too much.

josevalim commented 3 months ago

The load model should be fast compared to the download, as it does not compile anything, and it will help validate you have downloaded the right artifact. And I believe safe tensors, which we want to make the default, load parameters lazily, so it should be even less work.

lawik commented 3 months ago

Maybe the right idea but this makes it kind of rough if you just want to load/upload the model on machine that isn't set up for inference. Or at least with the times I'm seeing.

Example: https://huggingface.co/google-bert/bert-base-cased

Download, clocked by counting out loud while the progress bars were going:

~14 seconds

Total load_model execution time:

76 seconds

Tested with:

:timer.tc(fn -> Bumblebee.load_model({:hf, "google-bert/bert-base-cased"}) end) |> elem(0) |> then(& &1 / 1000) |> IO.inspect(label: "ms")

No configuration done at all.

lawik commented 3 months ago

Ideally I'd love to stream the download from hugging face to an S3-compatible but that is further out of scope from what Bumblebee is about.

jonatanklosko commented 3 months ago

You can take files from HF repository and put in S3 or wherever, then when you download onto the local machine use {:file, path_to_repo_dir} (just make sure you don't copy parameter files in multiple formats, as that would be unnecessary).

In the future we may have our own serialisation format for things, but I don't think we should be exposing the download of hf/transformers files.

76 seconds

You'd need to use EXLA.Backend, because there are some transformations that are going to be slow otherwise.

lawik commented 3 months ago

That was a lot faster. I can make do.