-
### Is your feature request related to a problem? / 你想要的功能和什么问题相关?
There are more models in [LMSYS Chatbot Arena](https://huggingface.co/spaces/lmsys/chatbot-arena) / [HuggingChat](https://huggingf…
-
感谢项目组提供的模型,非常优秀,也因此我希望基于你们模型再微调以供后续使用。
在使用的时候遇到两个问题。
1> 模型调用,在 [https://huggingface.co/FlagAlpha/Atom-7B-Chat](url) 上开篇提到 Atom-7B-32k-Chat ,不知该模型本身是否已经支持32K?是否使用的时候直接加载即可,不需要额外修改文件或参数,能使用32k长度
2>…
hbj52 updated
7 months ago
-
As of now, there is no way to modify RoPE Frequency Base and RoPE Frequency Scale.
We would need to edit `rope.cu` to support parameters for frequency and scale: https://github.com/turboderp/exlla…
-
There's a bug in attack_manager.py:
```
if self.conv_template.name == 'llama-2':
self.conv_template.messages = []
self.conv_template.append_message(self.conv_template.roles…
-
When I try to run the following command,
> `accelerate launch --num_processes=4 big_model_quantized_probing.py scripts/configs/probe_quantized_codellama-34b-4bit-unfreeze.yaml`
I got the followi…
-
We've recently broken logging and tracing to disk when run via `podman compose up`.
We are NOT writing to the shared directory which is accessible by both the container AND the host.
Below is a snipp…
-
### Feature request
Flash Attention 2 is a library that provides attention operation kernels for faster and more memory efficient inference and training: https://github.com/Dao-AILab/flash-attentio…
-
### 🐛 Describe the bug
From the README, its not very clear how to download different flavor/sizes of the models from HF, unless someone go to the next section and find the inventory list https://gi…
-
As for me no model is able to load anymore with versions higher then 2.x.
-
In GMC e2e tests, there are some failed case caused by response timeout,
tgi2.2.0 + meta-llama/CodeLlama-7b-hf on xeon in codegen test
tgi2.2.0 + meta-llama/CodeLlama-7b-hf on guadi is ok
s…