elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.27k stars 90 forks source link

Group all tokenizers undera single module and configure upfront #310

Closed jonatanklosko closed 6 months ago

jonatanklosko commented 6 months ago

Closes #143.

Instead of having one module per tokenizer (which we generated with a macro, since they were the same except for some defaults), we now have a single Bumblebee.Text.PreTrainedTokenizer module with a :type field, somewhat similar to how we have models with multiple architectures. We use the type to pick the right set of defaults.

Also, instead of passing options to Bumblebee.apply_tokenizer, they need to be set on the tokenizer itself via Bumblebee.configure. I added a deprecation notice, and also for serving users it's handled transparently either way. This is primarily a small optimisation as mentioned in https://github.com/elixir-nx/bumblebee/pull/307#discussion_r1426527482, but also aligns with featurizers.