cmusphinx / pocketsphinx

A small speech recognizer
Other
3.87k stars 713 forks source link

Timestamp drifting. #352

Open Elastino opened 1 year ago

Elastino commented 1 year ago

Hello, David Huggins-Daines

Because of how the endpoint's timestamp is calculated, there is room for misinterpretation of result in ps_endpointer_speech_start() and ps_endpointer_speech_end().

Currently, timestamp is converted with following code in ps_endpoint.c:273.

ep->timestamp += (double)nsamp / ps_endpointer_sample_rate(ep);

This conversion means that the time is based on the audio clock only, not the system clock. Hence, it always drift when audio clock is not derived from master clock. (e.g. Single clock source to all peripherals.) (Confirmed the behavior on my setup.)

Client of PocketSphinx, still can correlate to system clock because Client pumped audio samples into PocketSphinx. However, this lead to small time gap between Sent and Processed in data.maxFrameIndex.

Propose 1

Let PocketSphinx know external timestamp call. (e.g. passing UpTicks() or etc. from system.) Then return result in system wise timestamp.

Propose 2

Currently, ep->timestamp is not exposed, making it difficult to correlate from audio clock to system clock. (Although this is still possible when client cast ep with ps_endpointer_s, this will be undocumented behavior.) Return of current timestamp via function such as ps_endpointer_now(ps_endpointer_t *ep)

Please let me know any thoughts on this.

Thank you.

Best regards, CJ Lee