elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Tied word embeddings #339

Open jonatanklosko opened 4 months ago

jonatanklosko commented 4 months ago

Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.

The reason loading currently works is that the PyTorch .bin export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).

For additional discussion see #263.

seanmor5 commented 4 months ago

I actually think this is something we can do with the model state struct, since we can store metadata we can also tell when 2 parameters are tied. It's just a matter of determining an API to tie the buffers. If we know from safetensors that they are tied on load then it should be easier, I'm just thinking of how it would be done when declaring the model

jonatanklosko commented 4 months ago

If we know from safetensors that they are tied on load then it should be easier

We will know if they are tied based on the spec attribute, as in spec.tie_word_embeddings.