alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
918 stars 248 forks source link

Support webrtc pr #94

Closed milochen0418 closed 3 years ago

milochen0418 commented 3 years ago

Support WebRTC feature. And the demo is here.
https://www.youtube.com/watch?v=1Iu1JK21cUE

nshmyrev commented 3 years ago

Looks good, thanks a lot! I'll do some cosmetic adjustments later myself!

wizVR-zhangjun commented 2 years ago

Support WebRTC feature. And the demo is here. https://www.youtube.com/watch?v=1Iu1JK21cUE

Why do I run without a link lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

milochen0418 commented 2 years ago

Support WebRTC feature. And the demo is here. https://www.youtube.com/watch?v=1Iu1JK21cUE

Why do I run without a link lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

http://0.0.0.0:2700 is the link for the browser to open and nd It shows in your picture too.

wizVR-zhangjun commented 2 years ago

Support WebRTC feature. And the demo is here. https://www.youtube.com/watch?v=1Iu1JK21cUE

Why do I run without a link lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

http://0.0.0.0:2700 is the link for the browser to open and nd It shows in your picture too.

My computer is windows. If I use 0.0 0.0:2700 will report an error

wizVR-zhangjun commented 2 years ago

Support WebRTC feature. And the demo is here. https://www.youtube.com/watch?v=1Iu1JK21cUE

Why do I run without a link lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

http://0.0.0.0:2700 is the link for the browser to open and nd It shows in your picture too.

lADPJv8gRqejT7HNAmzNAzw_828_620 The result is no different

milochen0418 commented 2 years ago

Support WebRTC feature. And the demo is here. https://www.youtube.com/watch?v=1Iu1JK21cUE

Why do I run without a link lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

http://0.0.0.0:2700 is the link for the browser to open and nd It shows in your picture too.

lADPJv8gRqejT7HNAmzNAzw_828_620 The result is no different

Try Mac Book first. And please record your video first too. So people can know what happen on your side.

wizVR-zhangjun commented 2 years ago

支持 WebRTC 功能。演示在这里。https://www.youtube.com/watch?v=1Iu1JK21cUE

为什么我在没有链接的情况下运行 lQLPDhsAbA77WL7NBA_NB36wZWsAQGML0bwB0V1hOEAZAA_1918_1039

http://0.0.0.0:2700是浏览器打开的链接,它也显示在你的图片中

lADPJv8gRqejT7HNAmzNAzw_828_620 结果没有什么不同

先试试 Mac Book。 也请先录制您的视频。 所以人们可以知道发生在你身边的事情。

https://user-images.githubusercontent.com/60952586/147543019-0b1f5e66-ccfc-4e47-a6a4-df1e7a86ed20.mp4 https://user-images.githubusercontent.com/60952586/147543038-6caaec04-8c5d-4172-8c58-ef83ec5f4cca.mp4

milochen0418 commented 2 years ago

en and nd It shows in your picture too.

Hi, I cannot hear any voice from your recorded video. Can try to speak one, two, three, four, five in English ?

wizVR-zhangjun commented 2 years ago

No results have been printed

------------------ 原始邮件 ------------------ 发件人: "alphacep/vosk-server" @.>; 发送时间: 2021年12月29日(星期三) 晚上11:10 @.>; @.**@.>; 主题: Re: [alphacep/vosk-server] Support webrtc pr (#94)

en and nd It shows in your picture too.

Hi, I cannot hear any voice from your recorded video. Can try to speak one, two, three, four, five ?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

milochen0418 commented 2 years ago

Does the browser ask you the permission of audio when you open this URL ?

wizVR-zhangjun commented 2 years ago

yes

------------------ 原始邮件 ------------------ 发件人: "alphacep/vosk-server" @.>; 发送时间: 2021年12月30日(星期四) 上午10:35 @.>; @.**@.>; 主题: Re: [alphacep/vosk-server] Support webrtc pr (#94)

No results have been printed … ------------------ 原始邮件 ------------------ 发件人: "alphacep/vosk-server" @.>; 发送时间: 2021年12月29日(星期三) 晚上11:10 @.>; @.@.>; 主题: Re: [alphacep/vosk-server] Support webrtc pr (#94) en and nd It shows in your picture too. Hi, I cannot hear any voice from your recorded video. Can try to speak one, two, three, four, five ? — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

Does the browser ask you the permission of audio when you open this URL ?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

gooran commented 2 years ago

Hello, I have the same problem. The server is up, but nothing comes back from the server! What is the default model configuration for the server? Is this code set for the 48 kHz model? I have used a 16 kHz model.

The server Log:

$ python asr_server_webrtc.py 
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.0419731 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:278) Loading HCLG from model/graph/HCLG.fst
LOG (VoskAPI:ReadDataFiles():model.cc:293) Loading words from model/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:309) Loading subtract G.fst model from model/rescore/G.fst
LOG (VoskAPI:ReadDataFiles():model.cc:311) Loading CARPA model from model/rescore/G.carpa
======== Running on http://0.0.0.0:2700 ========
(Press CTRL+C to quit)

Console output:

client.js:91 Opened data channel
wizVR-zhangjun commented 2 years ago

Hello, I have the same problem. The server is up, but nothing comes back from the server! What is the default model configuration for the server? Is this code set for the 48 kHz model? I have used a 16 kHz model.

The server Log:

$ python asr_server_webrtc.py 
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.0419731 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:278) Loading HCLG from model/graph/HCLG.fst
LOG (VoskAPI:ReadDataFiles():model.cc:293) Loading words from model/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:309) Loading subtract G.fst model from model/rescore/G.fst
LOG (VoskAPI:ReadDataFiles():model.cc:311) Loading CARPA model from model/rescore/G.carpa
======== Running on http://0.0.0.0:2700 ========
(Press CTRL+C to quit)

Console output:

client.js:40 v=0
o=- 241238279213095452 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE 0 1
a=extmap-allow-mixed
a=msid-semantic: WMS OTcVdE60KOTXRFD7RaZxfHuAT0lda2Xg0qBs
m=audio 50096 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 172.27.18.128
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:288637366 1 udp 2122260223 172.27.18.128 50096 typ host generation 0 network-id 1 network-cost 10
a=candidate:1605877062 1 tcp 1518280447 172.27.18.128 9 typ host tcptype active generation 0 network-id 1 network-cost 10
a=ice-ufrag:U6fj
a=ice-pwd:SSuGIkR6EQkc6bD14sMLHXJ6
a=ice-options:trickle
a=fingerprint:sha-256 B5:4E:35:2D:9F:3E:A0:27:60:6C:AC:97:43:57:E4:85:0D:21:EC:A3:AE:8E:59:00:B6:8B:83:9A:23:8F:C0:B9
a=setup:actpass
a=mid:0
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:6 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=sendrecv
a=msid:OTcVdE60KOTXRFD7RaZxfHuAT0lda2Xg0qBs 79e57e00-8d2b-4c6e-b2fd-262592b53a2b
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3896719896 cname:rJkOILUJimJ1bnPx
a=ssrc:3896719896 msid:OTcVdE60KOTXRFD7RaZxfHuAT0lda2Xg0qBs 79e57e00-8d2b-4c6e-b2fd-262592b53a2b
a=ssrc:3896719896 mslabel:OTcVdE60KOTXRFD7RaZxfHuAT0lda2Xg0qBs
a=ssrc:3896719896 label:79e57e00-8d2b-4c6e-b2fd-262592b53a2b
m=application 53759 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 172.27.18.128
a=candidate:288637366 1 udp 2122260223 172.27.18.128 53759 typ host generation 0 network-id 1 network-cost 10
a=candidate:1605877062 1 tcp 1518280447 172.27.18.128 9 typ host tcptype active generation 0 network-id 1 network-cost 10
a=ice-ufrag:U6fj
a=ice-pwd:SSuGIkR6EQkc6bD14sMLHXJ6
a=ice-options:trickle
a=fingerprint:sha-256 B5:4E:35:2D:9F:3E:A0:27:60:6C:AC:97:43:57:E4:85:0D:21:EC:A3:AE:8E:59:00:B6:8B:83:9A:23:8F:C0:B9
a=setup:actpass
a=mid:1
a=sctp-port:5000
a=max-message-size:262144

client.js:54 v=0
o=- 3850213044 3850213044 IN IP4 0.0.0.0
s=-
t=0 0
a=group:BUNDLE 0 1
a=msid-semantic:WMS *
m=audio 35185 UDP/TLS/RTP/SAVPF 111 0 8
c=IN IP4 172.27.18.128
a=recvonly
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=mid:0
a=msid:d1089e35-44b3-406f-aba7-76309701bda9 5856d4fd-34e5-4bf7-b04d-c31d072c1e68
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=ssrc:4000364635 cname:d1402f68-8b4b-4194-88fa-e2cac4a1bcb2
a=rtpmap:111 opus/48000/2
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=candidate:9a3a4496edc5abd2080000c2e9f0eae7 1 udp 2130706431 172.27.18.128 35185 typ host
a=candidate:0ffe8b44cf5f93e6c5bc4264430b7f41 1 udp 1694498815 46.209.83.38 35185 typ srflx raddr 172.27.18.128 rport 35185
a=end-of-candidates
a=ice-ufrag:eAQT
a=ice-pwd:vzatxOyDGCC9UrQ2MH3vmY
a=fingerprint:sha-256 98:2F:98:E2:3B:97:35:EF:D7:01:8F:02:7C:55:35:AD:66:8A:37:16:BE:AB:05:55:05:96:B7:EC:BD:87:25:1D
a=setup:active
m=application 35185 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 172.27.18.128
a=mid:1
a=sctp-port:5000
a=max-message-size:65536
a=candidate:9a3a4496edc5abd2080000c2e9f0eae7 1 udp 2130706431 172.27.18.128 35185 typ host
a=candidate:0ffe8b44cf5f93e6c5bc4264430b7f41 1 udp 1694498815 46.209.83.38 35185 typ srflx raddr 172.27.18.128 rport 35185
a=end-of-candidates
a=ice-ufrag:eAQT
a=ice-pwd:vzatxOyDGCC9UrQ2MH3vmY
a=fingerprint:sha-256 98:2F:98:E2:3B:97:35:EF:D7:01:8F:02:7C:55:35:AD:66:8A:37:16:BE:AB:05:55:05:96:B7:EC:BD:87:25:1D
a=setup:active

client.js:91 Opened data channel

My system is windows You can take a look at this link https://github.com/alphacep/vosk-server/issues/160

nshmyrev commented 2 years ago

@gooran there was a bug, it should be fixed now