Open lorenzosinisi opened 1 year ago
The above is an app, do you have a link to the model implementation and params from HF? :)
Oh sorry! Yes this one is the model https://huggingface.co/togethercomputer/GPT-JT-6B-v1. Does this have similarities with other models that I could try to implement or even use directly? I would have no idea where to start but willing to help
Ah, nice! We support GPT 2, so maybe that can be used as a building block? Or at least you can compare GPT 2 Python's implementation with our GPT 2 and then to the same to implement your own. :)
Thanks! I will give it a try :) thanks! Any MR I can look at for reference?
BTW this model already knows some Elixir
❤️
The reference hf/transformers implementation of GPT-J is here. The implementation should be for the most part similar to any other text model we have, like GPT-2. By a brief look I think we may need to adjust/extend our attention implementation to support the rotary position embedding, but it's fine to modify the current code as necessary and we can find the best way to make it configurable later.
Working on this but it is gonna take a while 'cause I am new to transformers
Any plan on adding support for togethercomputer/GPT-JT (https://huggingface.co/spaces/togethercomputer/GPT-JT).
Seems like the closest alternative to GPT-3. What do you think? I would love to help but I don't know where to start