Closed Vincent-Stragier closed 11 months ago
The current workaround is to use Petals in CPU mode, i.e., remove all .cuda()
and change .from_pretrained(model_name)
to .from_pretrained(model_name, torch_dtype=torch.float32
.
Hi @Vincent-Stragier,
Sorry for the slow fix and thanks for reporting! This issue was finally resolved in #531.
Using the Getting Started Colab configured to use Llama 2, I'm not able to do beam search decoding (only greedy decoding works).
I'm using the following code snippet for greedy:
The output I got is: