BingLingGroup / autosub

Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
GNU General Public License v2.0
1.99k stars 245 forks source link

Application provided invalid #104

Closed howarlii closed 4 years ago

howarlii commented 4 years ago

用讯飞或百度API的时候会出现以下报错:

Converting speech regions to short-term fragments.
Converting: N/A% |                                                                                     | ETA:  --:--:--[s16le @ 00000117220a0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 9963 >= 9963
[s16le @ 000001ecd5ba0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 11229 >= 11229
[s16le @ 0000024523be0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 23578 >= 23578
Converting: N/A% |                                                                                     | ETA:  --:--:--[s16le @ 00000186b99c0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 57437 >= 57437
[s16le @ 000001c34cbc0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 981 >= 981
[s16le @ 000001c34cbc0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 11562 >= 11562
[s16le @ 000001c34cbc0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 81394 >= 81394
[s16le @ 0000020ec0c40900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 5272 >= 5272
[s16le @ 0000020ec0c40900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 22339 >= 22339
[s16le @ 000001ceff780900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 14928 >= 14928
[s16le @ 00000166f2460900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 6904 >= 6904
[s16le @ 00000166f2460900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12024 >= 12024
Converting:   8% |#######                                                                              | ETA:   0:00:17[s16le @ 000001e2aa2d0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 84269 >= 84269
Converting:   9% |########                                                                             | ETA:   0:00:16[s16le @ 000001fa3f2b0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 51869 >= 51869
Converting:  14% |############                                                                         | ETA:   0:00:12[s16le @ 0000014acc730900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 16621 >= 16621
Converting:  16% |##############                                                                       | ETA:   0:00:11[s16le @ 0000023eca920900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 8312 >= 8312
Converting:  24% |####################                                                                 | ETA:   0:00:09[s16le @ 000002a46e0b0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 89488 >= 89488
[s16le @ 00000222ce830900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 3088 >= 3088
Converting:  35% |##############################                                                       | ETA:   0:00:07[s16le @ 0000021bccde0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 8032 >= 8032
[s16le @ 000001ada8ec0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 32141 >= 32141
Converting:  93% |###############################################################################      | ETA:   0:00:00[s16le @ 000001e9bdea0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 2248 >= 2248
[s16le @ 0000029f758a0900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 98248 >= 98248
Converting:  97% |##################################################################################   | ETA:   0:00:00[s16le @ 000001cb9e670900] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 41522 >= 41522
Converting: 100% |#####################################################################################| Time:  0:00:09

Sending short-term fragments to Xun Fei Yun WebSocket API and getting result.

而且用百度API的话有时还会爆出“用户的请求QPS超限”

BingLingGroup commented 4 years ago

你可以按照issue模板填写一下相关信息吗?顺便附一下除了api相关信息以外的api配置文件内容。

howarlii commented 4 years ago

确保你已经看过 readme,也搜索并阅读过和你遇到的情况相关的问题。否则会被认为是重复的并被立刻关闭。

描述问题 部分视频在 Use Auditok to detect speech regions 时,会有形如 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12024 >= 12024 的错误产生,但程序依旧可以运行下去 这个只是部分视频在使用讯飞或百度API时会有的问题,API配置用的是readme中的模板,没有动其他设定;

如果使用百度API的话,后面还会出现request qps too much的错误, (本账号的qps只有2)

复现问题 执行autosub -sapi baidu -i .\1.mp4 -sconf .\baidu_config.json -S zh-CN

视频链接: 提取码:qpce

PS G:\Class> autosub -sapi baidu -i .\1.mp4 -sconf .\baidu_config.json -S zh-CN
Translation destination language not provided. Only performing speech recognition.
Speech language is the same as the destination language. Only performing speech recognition.

Convert source file to "C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav" to detect audio regions.
C:\Program Files\ffmpeg-20200315\bin\ffmpeg.exe -hide_banner -y -i ".\1.mp4" -vn -ac 1 -ar 48000 "C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav"
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '.\1.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.66.102
  Duration: 00:52:00.80, start: 0.000000, bitrate: 607 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 541 kb/s, 19.45 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 61 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    ISFT            : Lavf58.41.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      encoder         : Lavc58.75.100 pcm_s16le
size=  292266kB time=00:52:00.80 bitrate= 767.2kbits/s speed=1.85e+03x
video:0kB audio:292266kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000026%

Use ffprobe to check conversion result.
C:\Program Files\ffmpeg-20200315\bin\ffprobe.exe C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav -show_format -pretty -loglevel quiet
[FORMAT]
filename=C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:51:57.504000
size=285.416090 Mibyte
bit_rate=768 Kbit/s
probe_score=99
TAG:encoder=Lavf58.41.100
[/FORMAT]

Conversion complete.
Use Auditok to detect speech regions.

"C:\Users\HowarLi\AppData\Local\Temp\tmpz12vz0dj.wav" has been deleted.

Converting speech regions to short-term fragments.
Converting: N/A% |                                                                                     | ETA:  --:--:--[s16le @ 000001b9bff18f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 45 >= 45
[s16le @ 0000022c23bf8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 28525 >= 28525
Converting:   0% |                                                                                     | ETA:   0:02:25[s16le @ 000002d1e9c58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 16154 >= 16154
[s16le @ 000002d1e9c58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 21616 >= 21616
Converting:   2% |#                                                                                    | ETA:   0:01:02[s16le @ 000001c9d3b48f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 85 >= 85
Converting:   4% |###                                                                                  | ETA:   0:00:39[s16le @ 0000026d42808f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 5709 >= 5709
Converting:   5% |####                                                                                 | ETA:   0:00:37[s16le @ 0000015433428f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12235 >= 12235
Converting:  10% |########                                                                             | ETA:   0:00:26[s16le @ 00000263b4618f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 43899 >= 43899
Converting:  11% |##########                                                                           | ETA:   0:00:24[s16le @ 0000020877e58f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 18384 >= 18384
Converting:  16% |##############                                                                       | ETA:   0:00:20[s16le @ 0000018919f38f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 11944 >= 11944
Converting:  23% |###################                                                                  | ETA:   0:00:17[s16le @ 000001c78eb28f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 44853 >= 44853
Converting:  38% |################################                                                     | ETA:   0:00:12[s16le @ 00000213a2498f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 2221 >= 2221
[s16le @ 00000213a2498f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 12802 >= 12802
Converting:  40% |##################################                                                   | ETA:   0:00:12[s16le @ 000001777e0a8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 18816 >= 18816
Converting:  41% |###################################                                                  | ETA:   0:00:11[s16le @ 00000221c0918f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 24080 >= 24080
Converting:  50% |##########################################                                           | ETA:   0:00:09[s16le @ 0000022921328f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 28064 >= 28064
[s16le @ 000001322f328f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 864 >= 864
[s16le @ 0000025de6138f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 74325 >= 74325
Converting:  52% |############################################                                         | ETA:   0:00:09[s16le @ 00000260e8b78f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 10712 >= 10712
Converting:  55% |###############################################                                      | ETA:   0:00:08[s16le @ 00000282f5f78f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1224 >= 1224
[s16le @ 000002c8b3688f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 76083 >= 76083
Converting:  57% |#################################################                                    | ETA:   0:00:08[s16le @ 000001c223b98f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 7669 >= 7669
Converting:  58% |#################################################                                    | ETA:   0:00:08[s16le @ 0000019e79578f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 23392 >= 23392
Converting:  59% |##################################################                                   | ETA:   0:00:07[s16le @ 000001204fef8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 10466 >= 10466
Converting:  61% |####################################################                                 | ETA:   0:00:07[s16le @ 000001a5f98f8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 22317 >= 22317
Converting:  61% |####################################################                                 | ETA:   0:00:07[s16le @ 0000022510a28f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 23200 >= 23200
Converting:  62% |####################################################                                 | ETA:   0:00:07[s16le @ 000001bd57958f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1760 >= 1760
Converting:  71% |############################################################                         | ETA:   0:00:05[s16le @ 00000225f0178f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 40720 >= 40720
Converting:  81% |#####################################################################                | ETA:   0:00:03[s16le @ 00000207742b8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 37216 >= 37216
Converting:  92% |##############################################################################       | ETA:   0:00:01[s16le @ 0000022da1eb8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 46928 >= 46928
[s16le @ 0000022da1eb8f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 52048 >= 52048
[s16le @ 00000194e5618f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 1648 >= 1648
Converting:  98% |###################################################################################  | ETA:   0:00:00[s16le @ 0000017335108f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20904 >= 20904
[s16le @ 000002ba23428f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20861 >= 20861
[s16le @ 000001bf74e08f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 8349 >= 8349
[s16le @ 000001bf74e08f80] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 13811 >= 13811
Converting: 100% |#####################################################################################| Time:  0:00:18

Sending short-term fragments to Baidu ASR API and getting result.
Get the token online.
Speech-to-Text: 100% |#################################################################################| Time:  0:01:16
Receive something unexpected:
{
    "err_msg": "request qps too much",
    "err_no": 3304,
    "sn": "358315413981585810428"
}
Error: Speech-to-text failed.
All works done.

操作环境(请提供以下完整数据):

BingLingGroup commented 4 years ago

首先,xxx has been deleted.这句之后就是ffmpeg的问题了。 第二,non monotonically increasing dts我查到的原因是原视频和音频持续时间不一致导致的,为避免报错你可以提前先转好音频 https://blog.csdn.net/quantum7/article/details/82714601 第三,百度的qps表面上标着是2其实只能允许1,所以只要在配置文件里没写"disable_qps_limit": true,这个就没有并发,都是1,不会报错,我测试过,相关代码 https://github.com/BingLingGroup/autosub/blob/dev/autosub/cmdline_utils.py#L358-L361 这个选项的使用readme里有写 https://github.com/BingLingGroup/autosub/blob/dev/docs/README.zh-Hans.md#百度语音识别配置

BingLingGroup commented 4 years ago

另外,我个人测试感觉讯飞准确度比百度更高,你也可以考虑使用讯飞。讯飞的识别请求每日免费500次。

BingLingGroup commented 4 years ago

太久没有回复,先关了,关闭以后仍然可以回复。