Open apple2333cream opened 11 months ago
同问
这是我的代码 from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch import re import os import glob import time
torch.manual_seed(1234)
model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()
打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()
使用CPU进行推理,需要约32GB内存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()
默认gpu进行推理,需要约24GB显存
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()
可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
audio_url = "/home/wzp/project/yolov8/modelscope/output.wav" sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>" query = f"{audio_url}{sp_prompt}" audio_info = tokenizer.process_audio(query) inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info) inputs = inputs.to(model.device) pred = model.generate(**inputs, audio_info=audio_info) response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info) print(response)
这是终端输出的结果:
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it] /home/wzp/project/yolov8/modelscope/output.wav<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>
中文是zh,你看tokenization_qwen.py里的配置参数
感谢,已解决,但我这边Qwen-Audio跑本地的音频测试集准确率比Qwen-Audio-chat模型的准确率低4个百分点
楼主,请问qwen2-audio如何实现音频转文本呢?谢谢
这是我的代码 from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch import re import os import glob import time
torch.manual_seed(1234)
model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()
打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()
使用CPU进行推理,需要约32GB内存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()
默认gpu进行推理,需要约24GB显存
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()
可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
audio_url = "/home/wzp/project/yolov8/modelscope/output.wav" sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>" query = f"{sp_prompt}" audio_info = tokenizer.process_audio(query) inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info) inputs = inputs.to(model.device) pred = model.generate(**inputs, audio_info=audio_info) response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info) print(response)
这是终端输出的结果:
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it]
<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>