-
Hi Simon, the responses from llama2 have been truncated. What is the good way for llm to handle this? see
% llm -m l2c "give me 20 good names for avatars" --system "you are a creator"
Sure, here…
-
Hi,
Thanks very much for your work and for publishing your code. I am currently working on integration of SpinQuant into [torch/ao](https://github.com/pytorch/ao/pull/983/), and I would like to cla…
-
Thank you for your wonderful work!
Have you ever experimented with LLama2-7B as the model to do C-RLFT? How about the performance? Because OpenChat-3.5-0106 is based on Mistral, performance is real…
-
Traceback (most recent call last):
File "/home/m00830934/code/LongRoPE/evolution/evaluate.py", line 110, in
main(args)
File "/home/m00830934/code/LongRoPE/evolution/evaluate.py", line 52, …
-
Hi all, thanks for this great inference framework. We enjoy the speedups coming from it, but we are concerned about too high sampling variance.
Setting:
**Model**: llama2 70b model finetuned on…
-
Ollama makes it easy to run models such as llama2 locally on macOS easily:
https://ollama.ai/
The user runs a server on localhost, so the architecture of the plugin could likely follow the exist…
-
Hello,
Thank you for your very interesting work! when I run the llama2 experiment with "cluster_activate" and "random_update". There is the following error for calculating the gradient. Could you…
-
👋 Hello Neural Magic community developers,
I encountered an issue while calculating the perplexity for a locally converted Llama3-8B sparse model using the llm-compress library. I'm refer the spars…
-
I'm fine-tuning Llama3-8B on the C4 dataset (en subset) for 2000 steps using the `full_finetune_distributed` recipe. I find that the loss did not go down at all and the quantized accuracy is very low.…
-
I would like to build a chatbot with a long context. However, if the context gets too long, to prevent going over the model's context limits, I want to be able to delete old messages. I would also lik…