即便在time_stamp.py文件里加上device="cpu"仍然爆显存

TimyWimey commented 5 months ago

即便是加上了这个参数，仍然爆显存。。。

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch',
    param_dict={'use_timestamp': True},
    device="cpu"
)

详细报错信息：
Traceback (most recent call last):
  File "D:\Tool\GenSrt\run_srt.py", line 40, in <module>
    main(wav_name)
  File "D:\Tool\GenSrt\run_srt.py", line 15, in main
    write_long_txt(wav_name=wav_name,cut_line=500000) ##./tmp/.txt
  File "D:\Tool\GenSrt\time_stamp.py", line 13, in write_long_txt
    rec_result = inference_pipeline(audio_in=f'./raw_audio/{wav_name}.wav')
  File "D:\Tool\GenSrt\env\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 258, in __call__
    output = self.forward(output, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 511, in forward
    inputs['asr_result'] = self.run_inference(self.cmd, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 586, in run_inference
    asr_result = self.funasr_infer_modelscope(cmd['name_and_type'],
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\bin\asr_inference_launch.py", line 681, in _forward
    results = speech2text(**batch)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\bin\asr_infer.py", line 434, in __call__
    enc, enc_len = self.asr_model.encode(**batch, ind=decoding_ind)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\models\e2e_asr_paraformer.py", line 325, in encode
    encoder_out, encoder_out_lens, _ = self.encoder(feats, feats_lengths)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\models\encoder\sanm_encoder.py", line 373, in forward
    encoder_outs = self.encoders(xs_pad, masks)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\modules\repeat.py", line 32, in forward
    args = m(*args)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\models\encoder\sanm_encoder.py", line 101, in forward
    self.self_attn(x, mask, mask_shfit_chunk=mask_shfit_chunk, mask_att_chunk_encoder=mask_att_chunk_encoder)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\modules\attention.py", line 456, in forward
    att_outs = self.forward_attention(v_h, scores, mask, mask_att_chunk_encoder)
  File "D:\Tool\GenSrt\env\lib\site-packages\funasr\modules\attention.py", line 424, in forward_attention
    self.attn = torch.softmax(scores, dim=-1).masked_fill(
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

TimyWimey commented 5 months ago

调用的仍然是GPU而非CPU。我看FunASR项目暂时还没支持GPU中文离线转录，请问这个调用GPU是怎么实现的，有没有什么办法限制显存啥的，不让它爆呀？

LauraGPT commented 5 months ago

Solution: https://github.com/alibaba-damo-academy/FunASR/discussions/1319

MrXnneHang / Auto_Caption_Generated_Offline

即便在time_stamp.py文件里加上device="cpu"仍然爆显存 #2