Huggingface 版必须要安装flash attention？

Armod-I commented 10 months ago

报错如下： ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

其它模型可以不用flash attention的

zhaoxudong01 commented 10 months ago

可以使用我们开源的镜像，有安装flash_attn

Armod-I commented 10 months ago

可以使用我们开源的镜像，有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention，模型可以跑了，但目前发现无法复现megatron版的输出

模型：Yuan 2.0 2B hf 推理代码：使用Huggingface模型主页的代码输入：

编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python

输出：

<s> 编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
</s>100000000000000000000000000000000000000000000000000

zhaoxudong01 commented 10 months ago

我们做了如下测试，输出结果是正常的。请在“代码如下：”后面的位置加入<sep>试一下。

输入：

编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。示例：string_reverse('hello') olleh 代码如下：

输出：

<sep> ```python
def string_reverse(string):
    return string[::-1]
```<eod>

Armod-I commented 10 months ago

读取模型代码:

import torch, transformers
from transformers import AutoModelForCausalLM,AutoTokenizer,LlamaTokenizer

print("Creat tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained(yuan_path)
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)

print("Creat model...")
model = AutoModelForCausalLM.from_pretrained(yuan_path, torch_dtype=torch.bfloat16, trust_remote_code=True).to('cuda:1')

推理代码：

question = """编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：<sep>
```python
"""

inputs = tokenizer(question, return_tensors="pt")["input_ids"].to("cuda:1")
outputs = model.generate(inputs,do_sample=False,max_length=200)
print(tokenizer.decode(outputs[0]))

输出：

<s> 编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：<sep> ```python
</s>
```
# 单元测试用例：
```python
def test_string_reverse():
    assert string_reverse('hello') == 'olleh'
    assert string_reverse('world') == 'dlrow'
    assert string_reverse('python') == 'nohtyp'
```<eod>

模型没有去实现string_reverse函数，只是写了一些测试用例

zhaoxudong01 commented 10 months ago

@Hicollj

Hicollj commented 10 months ago

请尝试以下输入：

问题描述：编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
def string_reverse(string):

另外，请务必使用贪婪搜索（greedy decoding）生成代码，可令temperature=1，top_k=1。

Shawn-IEITSystems commented 10 months ago

@Armod-I 请问问题是否已解决？

pengyb2001 commented 9 months ago

可以使用我们开源的镜像，有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention，模型可以跑了，但目前发现无法复现megatron版的输出

模型：Yuan 2.0 2B hf 推理代码：使用Huggingface模型主页的代码输入：
编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
输出：
<s> 编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
</s>100000000000000000000000000000000000000000000000000
可以使用我们开源的镜像，有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention，模型可以跑了，但目前发现无法复现megatron版的输出

模型：Yuan 2.0 2B hf 推理代码：使用Huggingface模型主页的代码输入：
编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
输出：
<s> 编写一个 Python 函数，它接受一个字符串作为参数，并返回该字符串的反转版本。
示例：
>>> string_reverse('hello')
olleh
代码如下：
```python
</s>100000000000000000000000000000000000000000000000000

请问您是如何手动关掉flash_attn的呢，我想用CPU跑这个模型，我尝试按huggingface上面的https://huggingface.co/IEITYuan/Yuan2-2B-hf/blob/main/README.md 的调用方法并修改为

import torch, transformers
import sys, os
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer

print("Creating tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', add_eos_token=False, add_bos_token=False, eos_token='<eod>')
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)

print("Creating model...")
# 注意这里移除了对GPU的特定参数
model = AutoModelForCausalLM.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', use_flash_attention=False)
print(model.config)

inputs = tokenizer("请问目前最先进的机器学习算法有哪些？", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, do_sample=False, max_length=100)
print(tokenizer.decode(outputs[0]))

但是还是会报错ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn 后续：我手动关了flash_attn 见该issue92

IEIT-Yuan / Yuan-2.0

Huggingface 版必须要安装flash attention？ #70