Open Tortoise17 opened 4 months ago
@westfish
result = task(audio=audio_file, prompt=prompt)['prompt']
[2024-05-10 16:38:33,674] [ ERROR] - (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 4898] and the shape of Y = [1, 1223, 1223]. Received [4898] in X is not equal to [1223] in Y at i:3.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:86)
我正在使用您的範例作為 audio2caption 範例。
我面臨兩個錯誤。 首先,它只需要很小的 wav 文件,例如首先它說輸入少於 300 秒,而對於 180 秒的輸入,它建議輸入少於 50 秒。 有什麼具體原因嗎?
當我運行 49 秒的 wav 工作流程時出現第二個主要錯誤。
/source/envs/paddle/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py:335 在後處理中 返回 self._outputs[“結果”]
關鍵錯誤:“結果”
請指導
I am using your example for audio2caption example.
I am facing two errors. First is that it takes only small wav file, like first it said input less than 300 sec, than for 180 sec of input it advised input less than 50 sec. Is there any specific reason for that?
and second major error which comes when I run the workflow with 49 seconds of wav.
/source/envs/paddle/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py:335 in postprocess return self._outputs["result"]
KeyError: 'result'
Please guide