Closed howarlii closed 4 years ago
你可以按照issue模板填写一下相关信息吗?顺便附一下除了api相关信息以外的api配置文件内容。
确保你已经看过 readme,也搜索并阅读过和你遇到的情况相关的问题。否则会被认为是重复的并被立刻关闭。
描述问题
部分视频在 Use Auditok to detect speech regions 时,会有形如 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12024 >= 12024
的错误产生,但程序依旧可以运行下去
这个只是部分视频在使用讯飞或百度API时会有的问题,API配置用的是readme中的模板,没有动其他设定;
如果使用百度API的话,后面还会出现request qps too much
的错误,
(本账号的qps只有2)
复现问题
执行autosub -sapi baidu -i .\1.mp4 -sconf .\baidu_config.json -S zh-CN
视频链接: 提取码:qpce
PS G:\Class> autosub -sapi baidu -i .\1.mp4 -sconf .\baidu_config.json -S zh-CN
Translation destination language not provided. Only performing speech recognition.
Speech language is the same as the destination language. Only performing speech recognition.
Convert source file to "C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav" to detect audio regions.
C:\Program Files\ffmpeg-20200315\bin\ffmpeg.exe -hide_banner -y -i ".\1.mp4" -vn -ac 1 -ar 48000 "C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav"
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '.\1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.66.102
Duration: 00:52:00.80, start: 0.000000, bitrate: 607 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 541 kb/s, 19.45 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 61 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
ISFT : Lavf58.41.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc58.75.100 pcm_s16le
size= 292266kB time=00:52:00.80 bitrate= 767.2kbits/s speed=1.85e+03x
video:0kB audio:292266kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000026%
Use ffprobe to check conversion result.
C:\Program Files\ffmpeg-20200315\bin\ffprobe.exe C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav -show_format -pretty -loglevel quiet
[FORMAT]
filename=C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:51:57.504000
size=285.416090 Mibyte
bit_rate=768 Kbit/s
probe_score=99
TAG:encoder=Lavf58.41.100
[/FORMAT]
Conversion complete.
Use Auditok to detect speech regions.
"C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav" has been deleted.
Converting speech regions to short-term fragments.
Converting: N/A% | | ETA: --:--:--[s16le @ 000001b9bff18f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 45 >= 45
[s16le @ 0000022c23bf8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 28525 >= 28525
Converting: 0% | | ETA: 0:02:25[s16le @ 000002d1e9c58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 16154 >= 16154
[s16le @ 000002d1e9c58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 21616 >= 21616
Converting: 2% |# | ETA: 0:01:02[s16le @ 000001c9d3b48f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 85 >= 85
Converting: 4% |### | ETA: 0:00:39[s16le @ 0000026d42808f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 5709 >= 5709
Converting: 5% |#### | ETA: 0:00:37[s16le @ 0000015433428f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12235 >= 12235
Converting: 10% |######## | ETA: 0:00:26[s16le @ 00000263b4618f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 43899 >= 43899
Converting: 11% |########## | ETA: 0:00:24[s16le @ 0000020877e58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 18384 >= 18384
Converting: 16% |############## | ETA: 0:00:20[s16le @ 0000018919f38f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 11944 >= 11944
Converting: 23% |################### | ETA: 0:00:17[s16le @ 000001c78eb28f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 44853 >= 44853
Converting: 38% |################################ | ETA: 0:00:12[s16le @ 00000213a2498f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 2221 >= 2221
[s16le @ 00000213a2498f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12802 >= 12802
Converting: 40% |################################## | ETA: 0:00:12[s16le @ 000001777e0a8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 18816 >= 18816
Converting: 41% |################################### | ETA: 0:00:11[s16le @ 00000221c0918f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 24080 >= 24080
Converting: 50% |########################################## | ETA: 0:00:09[s16le @ 0000022921328f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 28064 >= 28064
[s16le @ 000001322f328f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 864 >= 864
[s16le @ 0000025de6138f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 74325 >= 74325
Converting: 52% |############################################ | ETA: 0:00:09[s16le @ 00000260e8b78f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 10712 >= 10712
Converting: 55% |############################################### | ETA: 0:00:08[s16le @ 00000282f5f78f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1224 >= 1224
[s16le @ 000002c8b3688f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 76083 >= 76083
Converting: 57% |################################################# | ETA: 0:00:08[s16le @ 000001c223b98f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 7669 >= 7669
Converting: 58% |################################################# | ETA: 0:00:08[s16le @ 0000019e79578f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 23392 >= 23392
Converting: 59% |################################################## | ETA: 0:00:07[s16le @ 000001204fef8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 10466 >= 10466
Converting: 61% |#################################################### | ETA: 0:00:07[s16le @ 000001a5f98f8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 22317 >= 22317
Converting: 61% |#################################################### | ETA: 0:00:07[s16le @ 0000022510a28f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 23200 >= 23200
Converting: 62% |#################################################### | ETA: 0:00:07[s16le @ 000001bd57958f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1760 >= 1760
Converting: 71% |############################################################ | ETA: 0:00:05[s16le @ 00000225f0178f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 40720 >= 40720
Converting: 81% |##################################################################### | ETA: 0:00:03[s16le @ 00000207742b8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 37216 >= 37216
Converting: 92% |############################################################################## | ETA: 0:00:01[s16le @ 0000022da1eb8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 46928 >= 46928
[s16le @ 0000022da1eb8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 52048 >= 52048
[s16le @ 00000194e5618f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1648 >= 1648
Converting: 98% |################################################################################### | ETA: 0:00:00[s16le @ 0000017335108f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20904 >= 20904
[s16le @ 000002ba23428f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20861 >= 20861
[s16le @ 000001bf74e08f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 8349 >= 8349
[s16le @ 000001bf74e08f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 13811 >= 13811
Converting: 100% |#####################################################################################| Time: 0:00:18
Sending short-term fragments to Baidu ASR API and getting result.
Get the token online.
Speech-to-Text: 100% |#################################################################################| Time: 0:01:16
Receive something unexpected:
{
"err_msg": "request qps too much",
"err_no": 3304,
"sn": "358315413981585810428"
}
Error: Speech-to-text failed.
All works done.
操作环境(请提供以下完整数据):
首先,xxx has been deleted.
这句之后就是ffmpeg的问题了。
第二,non monotonically increasing dts
我查到的原因是原视频和音频持续时间不一致导致的,为避免报错你可以提前先转好音频 https://blog.csdn.net/quantum7/article/details/82714601
第三,百度的qps表面上标着是2其实只能允许1,所以只要在配置文件里没写"disable_qps_limit": true,
这个就没有并发,都是1,不会报错,我测试过,相关代码 https://github.com/BingLingGroup/autosub/blob/dev/autosub/cmdline_utils.py#L358-L361
这个选项的使用readme里有写
https://github.com/BingLingGroup/autosub/blob/dev/docs/README.zh-Hans.md#百度语音识别配置
另外,我个人测试感觉讯飞准确度比百度更高,你也可以考虑使用讯飞。讯飞的识别请求每日免费500次。
太久没有回复,先关了,关闭以后仍然可以回复。
用讯飞或百度API的时候会出现以下报错:
而且用百度API的话有时还会爆出“用户的请求QPS超限”