Petals is a Torrent-like network where participants host smaller or larger parts of a very large language model, such as Llama2-70B, Llama1-65B, Bloom, etc. (They have even recently added support for StableBeluga2), and make them available with decent speed for inference or fine-tuning to other users connected to the network.
According to the project description, it provides opportunities to create our own sampling methods, so I think implementation should not be too much of a problem.
I think this would be quite an interesting addition to LMQL.
Petal does look very interesting and it seems it even offers full access to an LM hidden state. This should make it compatible with LMQL's constrained decoding. I will have a closer look.
What do you think of the idea of adding a model to support the Petals network?
petals.dev
Petals is a Torrent-like network where participants host smaller or larger parts of a very large language model, such as Llama2-70B, Llama1-65B, Bloom, etc. (They have even recently added support for StableBeluga2), and make them available with decent speed for inference or fine-tuning to other users connected to the network.
According to the project description, it provides opportunities to create our own sampling methods, so I think implementation should not be too much of a problem.
I think this would be quite an interesting addition to LMQL.