Open zfarrell13 opened 11 months ago
We do not test the inference on the Mac device. We strongly suggest moving the training and inference to the modern GPUs. In our A100 machines, a response takes only a few seconds.
I'm encountering a similar issue on my M1 Max with 32GB of RAM.
The screenshot illustrates that ~12GB of swap space is being utilized, despite there being available free RAM. The execution time for the readme.md
example is as follows:
python3 test.py 27.56s user 1372.35s system 19% cpu 2:00:14.36 total
That's 2 hours!?!
A few questions I have:
Code I used:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
DEVICE = "mps"
torch.manual_seed(1234)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-Audio-Chat", device_map=DEVICE, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio-Chat", device_map=DEVICE, trust_remote_code=True).eval()
audio_url = "assets/audio/1272-128104-0000.flac"
sp_prompt = ""
query = f"<audio>{audio_url}</audio>{sp_prompt}"
audio_info = tokenizer.process_audio(query)
inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info)
with torch.no_grad():
inputs = inputs.to(model.device)
pred = model.generate(**inputs, audio_info=audio_info)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False, audio_info=audio_info)
print(response)
Performance of Pytorch for M1 mps backend is very poor. Nvidia cards are very suitable for usage.
I wrote a script to make an inference on an audio file, and it took 30 minutes to get a response:
script:
` import time from transformers import AutoModelForCausalLM, AutoTokenizer
Initialize model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-Audio-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio-Chat", device_map="cpu", trust_remote_code=True).eval()
Set up audio file
audio_file = "file.mp3"
Define helper validation function
def validate_response(text): if ',' not in text: print("Error - response did not contain comma delimited list") return False return True
Time full execution
start = time.time()
Attempt audio analysis
try: query = tokenizer.from_list_format([ {'audio': audio_file}, {'text': 'Describe this audio with 5 comma-separated adjectives'}, ]) response, history = model.chat(tokenizer, query=query, history=None)
except Exception as e: print(f"Error: {e}") valid = False
end = time.time() elapsed = end - start
Print total time
print(f"Elapsed time: {elapsed} seconds")
Check if the response was valid
if not valid: print("Issues with audio analysis")`
result:
`python caption_audio.py audio_start_id: 155164, audio_end_id: 155165, audio_pad_id: 151851. Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████| 9/9 [00:00<00:00, 17.97it/s] <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user Audio 1: Describe this audio with 5 comma-separated adjectives<|im_end|> <|im_start|>assistant
Funky, groovy, energetic, hip-hop, dance Elapsed time: 1812.089405298233 seconds`
Also inconsistent results running the same script a subsequent time:
second result:
`python caption_audio.py audio_start_id: 155164, audio_end_id: 155165, audio_pad_id: 151851. Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████| 9/9 [00:00<00:00, 9.99it/s] <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user Audio 1: Describe this audio with 5 comma-separated adjectives<|im_end|> <|im_start|>assistant
This is a funky, groovy, upbeat, electronic, electro, electronic beats, electronic music, electronic dance, electronic instrumental, electronic background, electronic soundtrack, electronic soundtrack, electro hip hop, electro hip hop beats, electro hip hop instrumental, electro rap, electro rap beats, electro rap instrumental, electronic music for tv, electronic music for radio, electronic music for advertising, electronic music for games, electronic music for movies, electronic music for youtube, electronic music for corporate use, electronic music for commercial use, electronic music for business, electronic music for presentations, electronic music for websites, electronic music for apps, electronic music for software, electronic music for advertising, electronic music for marketing, electronic music for product presentation, electronic music for corporate presentations, electronic music for business videos, electronic music for product videos, electronic music for commercials, electronic music for radio, electronic music for tv, electronic music for films, electronic music for viral marketing, electronic music for web advertisements, electronic music for youtube videos, electronic music for social media, electronic music for apps, electronic music for mobile, electronic music for corporate, electronic music for background, electronic music for fashion, electronic music for lifestyle, electronic music for beauty, electronic music for food, electronic music for travel, electronic music for health, electronic music for fitness, electronic music for meditation, electronic music for relaxation, electronic music for studying, electronic music for working, electronic music for sleeping, electronic music for dancing, electronic music for fun, electronic music for playing, electronic music for singing, electronic music for podcast, electronic music for video games, electronic music for background music, electronic music for club, electronic music for rapping, electronic music for beat-making, electronic music for producing, electronic music for composing, electronic music for singing, electronic music for playing, electronic music for dancing, electronic music for fun, electronic music for playing, electronic music for singing, electronic music for producing, electronic music for composing, electronic music for radio, electronic music for tv, electronic music for films, electronic music for viral marketing, electronic music for web advertisements, electronic music for youtube videos, electronic music for social media, electronic music for apps, electronic music for mobile, electronic music for corporate, electronic music for background, electronic music for fashion, electronic music for lifestyle, electronic music for beauty, electronic music for food, electronic music for travel, electronic music for health, electronic music for fitness, electronic music for meditation, electronic music for relaxation, electronic music for studying, electronic music for working, electronic music for sleeping, electronic music for Elapsed time: 60488.493457078934 seconds`
any tips to make this run more efficiently?