-
- [ ] Fp8 kv-cache
- [ ] Kv-cache prefix reuse
- [ ] Grammar constrained speedup
- [ ] `torch.compile` like speedups
- [ ] Simple one-liner `pip install`
- [ ] Multi lora support (lorax kind of)
…
-
Hi,
I was really impressed by SPHINX's capability.
However, is it possible to do in-context learning with it?
Something similar to your example for Multimodal LLaMA2 https://alpha-vllm.github.i…
-
![image](https://github.com/yangjianxin1/Firefly/assets/57835580/29229f58-1897-4c71-aa61-355f846e2946)
加载YeungNLP/firefly-llama2-13b时报如上错误
-
Traceback (most recent call last):
File "/home/hope/work/baby-llama2-chinese/eval_hope.py", line 67, in
model.load_state_dict(state_dict, strict=False)
File "/home/hope/miniconda3/envs/lla…
-
Wondering what would be a decent size for the dataset could be for the model to be quantized, I am looking at a model like LLama2-7B. Any help is appreciated
-
- Use different datasets for calibration (dummy, Pile, gsm8k, triviaqa ans so on)
- Use llama2-7b with different int8 quantization types
- Use alpha in range (0, 1)
- Use lm-evaluation-harness to accu…
-
If practical, the LLMs might be useful for a variety of tasks:
- Quality evaluation
- Data augmentation (including back translation for low-resource languages)
- Using as a teacher model
As a fi…
-
Try to use perf_analyzer as follows deploying LLaMA2-13B with triton:
python scripts/launch_triton_server.py --world_size 2 --model_repo triton_model_repo
perf_analyzer -m ensemble -i grpc --shape…
-
I finetuned llama2 on the full dataset, ran gradient ascent on forget05, and then evaluated the unlearned model on forget05. Surprisingly, when I looked at the eval_log_forget.json file all I could se…
-
Hi, I'm getting below error when trying to start vidur simulator in ubuntu 20.04 on python 3.10 venv, also i tested with mambo
INFO 07-09 16:17:21 config.py:21] trace_request_length_generator_deco…