Closed Umar17 closed 4 years ago
Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3
under the decoder conf in the YAML file.
Yes I am using chain file but frame-subsampling-factor
option is in place. Attached is my yaml file.
use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu
use-vad: False silence-timeout: 60
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'
logging: version : 1 disable_existing_loggers: False formatters: simpleFormater: format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s' datefmt: '%Y-%m-%d %H:%M:%S' handlers: console: class: logging.StreamHandler formatter: simpleFormater level: DEBUG root: level: DEBUG handlers: [console]
And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.
I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency
Can you give some numbers -- the actual difference in decoding time that you are seeing?
I assume you understand that -r 32000
option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.
Yes, I understand the byte rate and I experimented with -r 256000
as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster
that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.
Try changing to traceback-period-in-secs: 0.25
.
Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000
and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.
Hi,
I just experimented online decoding with online2-tcp-nnet3-decoder-faster which was being done using kaldinnet2onlinedecoder (through kaldi-gstreamer-server) earlier. I experienced about 3 times faster decoding with online2-tcp-nnet3-decoder-faster. I went through codes of both decoders and realized that the working is fairly identical. Can you please guide why is the later is faster? Is it my mistake or something else?
PS: parameters (like beam, lattice beam and maximum-active) kept identitical for both decoders.
Best Regards Umar