elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.35k stars 100 forks source link

Support GPT-J #154

Open lorenzosinisi opened 1 year ago

lorenzosinisi commented 1 year ago

Any plan on adding support for togethercomputer/GPT-JT (https://huggingface.co/spaces/togethercomputer/GPT-JT).

Seems like the closest alternative to GPT-3. What do you think? I would love to help but I don't know where to start

josevalim commented 1 year ago

The above is an app, do you have a link to the model implementation and params from HF? :)

lorenzosinisi commented 1 year ago

Oh sorry! Yes this one is the model https://huggingface.co/togethercomputer/GPT-JT-6B-v1. Does this have similarities with other models that I could try to implement or even use directly? I would have no idea where to start but willing to help

josevalim commented 1 year ago

Ah, nice! We support GPT 2, so maybe that can be used as a building block? Or at least you can compare GPT 2 Python's implementation with our GPT 2 and then to the same to implement your own. :)

lorenzosinisi commented 1 year ago

Thanks! I will give it a try :) thanks! Any MR I can look at for reference?

BTW this model already knows some Elixir

Screenshot 2023-01-30 at 10 10 49

❤️

jonatanklosko commented 1 year ago

The reference hf/transformers implementation of GPT-J is here. The implementation should be for the most part similar to any other text model we have, like GPT-2. By a brief look I think we may need to adjust/extend our attention implementation to support the rotary position embedding, but it's fine to modify the current code as necessary and we can find the best way to make it configurable later.

lorenzosinisi commented 1 year ago

Working on this but it is gonna take a while 'cause I am new to transformers