PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Apache License 2.0
301 stars 117 forks source link

關鍵錯誤:“結果” KeyError: 'result' #526

Open Tortoise17 opened 4 months ago

Tortoise17 commented 4 months ago

我正在使用您的範例作為 audio2caption 範例。

我面臨兩個錯誤。 首先,它只需要很小的 wav 文件,例如首先它說輸入少於 300 秒,而對於 180 秒的輸入,它建議輸入少於 50 秒。 有什麼具體原因嗎?

當我運行 49 秒的 wav 工作流程時出現第二個主要錯誤。

/source/envs/paddle/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py:335 在後處理中 返回 self._outputs[“結果”]

關鍵錯誤:“結果”

請指導

I am using your example for audio2caption example.

I am facing two errors. First is that it takes only small wav file, like first it said input less than 300 sec, than for 180 sec of input it advised input less than 50 sec. Is there any specific reason for that?

and second major error which comes when I run the workflow with 49 seconds of wav.

/source/envs/paddle/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py:335 in postprocess return self._outputs["result"]

KeyError: 'result'

Please guide

Tortoise17 commented 4 months ago

@westfish

result = task(audio=audio_file, prompt=prompt)['prompt']
[2024-05-10 16:38:33,674] [   ERROR] - (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 4898] and the shape of Y = [1, 1223, 1223]. Received [4898] in X is not equal to [1223] in Y at i:3.
  [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:86)