AaronZhangL / unimrcp

Automatically exported from code.google.com/p/unimrcp
Apache License 2.0
0 stars 0 forks source link

GVP Rtp data loss #122

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
I am developing a recognizer plugin for our firms recognizer engine. Currently 
it seems to work OK with the umc and unimrcpclient applications.
Using umc I can synchronously run 30 parallel requests.(I am doing this test by 
sending "run recog" one after the other quickly)
Now I am trying to integrate with GVP Voice portal.
I am using unimrcp 1.0 on windows server 2008 R2 64 bit. I build my libaries in 
32 bit mode.

With GVP everything seems OK most of the time but sometimes (like 1 out of 
10-20 requests)
Rtp data is absent in the incoming audio. I checked the wireshark traces but 
all the data is there. (sample data and wireshark trace are attached)
I observe this loss when there is only one active recognition request.

When I tried with demorecog I was only able to observe a similar issue once.

I was thinking if I was blocking the context that Rtp is running. But there is 
obsolutely no waiting in the portion of code that acccepts the Rtp data from 
the engine.
No mutex locking, no new memory allocations. I do not even send the Rtp data to 
voice activity detection in the same thread,
I analyse audio data in another thread after reading it. (But the mechanism is 
totally mutex-free and thread safe - one reader thread, one writer thread, 
circular buffer mechanism)...
I used a performance analyzer to get the timings of each thread in my plugin 
but it seems everything is OK as far as the timings go.

I am dealing with this issue for days an nearly out of ideas.
What I have tried up until now:

I checked the differences of the demorecog and my project :
    the differences are:
    a- demo recog do voice activity detection in the demo_recog_stream_write function but I do not.
    b- demo recog log the incoming audio data to file in the demo_recog_stream_write function but I do not.
    c- I do write the incoming audio data to a buffer which is to be read from another thread.
    d- I ignore the sent frame if the following statement is false.
        (frame->type & MEDIA_FRAME_TYPE_AUDIO) == MEDIA_FRAME_TYPE_AUDIO
        As far as I remember the previous version of demorecog plugin was also ignoring these frames but in the current code it is 
        writing these frames too. I tried both but see no difference in the logged audio (the speech was the same anyway)

I tried incresing the priority of the thread that demo_recog_stream_write 
CALLBACK is made in my own plugin.

I tried increasing max_count from 5 to 15 in the function rtp_rx_process which 
resides in mpf_rtp_stream.c

I tried logging the audio data as soon as I get it in the Rtp thread.

I tried using several levels of jitter_buffer up to 350 msec.

I should also note that when I try over 30-40 requests in parallel with umc I 
get similar losses but with GVP this happens even with a single request.

Observing the wireshark traces I saw that the delta is very high time to time, 
with a little google search I learned this is said to be expected when IVR have 
a 
VAD(voice activity detector) of its own and  prefers not to send silence data 
to MRCP server. In GVP documentaton they say this is the only behavior. So I 
could not try 
and get the GVP send the data in continuous mode to check if my problem was due 
to this.
Jitter level is also seem acceptable.

It would be so kind of you to give me some kind of hint or direct me in the 
right way for this problem.
I tried to write everything I remember here but if there is anything missing I 
can try to prvide. 

Note: In the trace file you can find the attached Rtp flow at time 
17.42.10412023

Regards

Original issue reported on code.google.com by nazillie...@gmail.com on 4 May 2011 at 2:57

Attachments:

GoogleCodeExporter commented 9 years ago
I have miswritten rtp packet time. You can check the rtp packet at time 
17:42:09.912046

I have also attached the unimrcpserver log.

Regards

Original comment by nazillie...@gmail.com on 5 May 2011 at 1:42

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,

I went ahead and defined the ENABLE_JB_TRACE, ENABLE_RTP_PACKET_TRACE 
preprocessor tags in the mpf project and and also added a few logs to some 
error conditions which were not logged.
Redo my tests until an error condition occured(actually two errors in this test)
I've attached the resultant log file(rtplog107.zip).

There are several "JB DISCARD not aligned" errors.
Especially the ones causing me heaches are probably the ones around line 
no:11400 there are 14 consequent "JB DISCARD not aligned" errors which amounts 
to 280 milliseconds of packets which are discarded(around the same amount of 
loss in the audio logs).
I've attached the file cinyuan.zip which contains the audio log extracted from 
the wireshark RTP trace(cinyuan.raw) and my plugin's audio log.

I am still working on the issue. Any advice is welcome.
Regards

Original comment by mert.buy...@sestek.com on 6 May 2011 at 9:13

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,

Checking the Rtp traces:

1- RTP time= 82864 ssrc=81f4f107 pt=  8  ts=1984012384 seq= 8218 size=160
2- RTP time= 82864 ssrc=81f4f107 pt=  8  ts=1984012544 seq= 8219 size=160
3- RTP time=341890 ssrc=81f4f107 pt=  8  ts=1984012808 seq= 8220 size=160 
(IGNORED)
4- RTP time=341890 ssrc=81f4f107 pt=  8  ts=1984012968 seq= 8221 size=160 
(IGNORED)

and unimrcp codes (mpf_jitter_buffer.c):

-------------------
static APR_INLINE jb_result_t 
mpf_jitter_buffer_write_prepare(mpf_jitter_buffer_t *jb, apr_uint32_t ts, 
apr_uint32_t *write_ts)
{
    if(jb->write_sync) {
        jb->write_ts_offset = ts - jb->write_ts;
        jb->write_sync = 0;
    }

    *write_ts = ts - jb->write_ts_offset + jb->playout_delay_ts;
    if(*write_ts % jb->frame_ts != 0) {
        /* not frame alligned */
        return JB_DISCARD_NOT_ALLIGNED;
    }
    return JB_OK;
}
-------------------

As far as I understand from the codes the first RTP packet's timestamp is taken 
into account and any RTP packet header other
than this first packet is expected to have its ts(timestamp) value as in the 
form of 
(first_ts + (arbitrarynumber * frame_size)) 
here the frame_size is 160 and arbitrary number is monotonically increasing.

So looking at the above Rtp values if we take into account 2nd and 3rd
1984012808 - 1984012544 = 264
which is not divisible by 160 so the packet is ignored.
Also for 2nd and 4th packets
1984012968 - 1984012544 = 424
so it is also ignored.

I've also looked at the rfc of RTP from (http://www.ietf.org/rfc/rfc3550.txt). 
It states:
-------------------
If RTP packets are generated periodically, the nominal sampling
instant as determined from the sampling clock is to be used, not a
reading of the system clock.  As an example, for fixed-rate audio
the timestamp clock would likely increment by one for each
sampling period.  If an audio application reads blocks covering
160 sampling periods from the input device, the timestamp would be
increased by 160 for each such block, regardless of whether the
block is transmitted in a packet or dropped as silent.
-------------------

So I came to the conclusion that the RTP timestamps that GVP sends are at a 
fault here. 
Since GVP(IVR) side has its own VAD system they seem to be dropping the 
silences and just sending the 
speech data. This seems to be called silence suppression, implemented in the 
wrong way.

Also searching GVP documentation I found that there is no way to close the VAD 
for recognition modes:
-------------------
RTP Send Mode Specifies the output mode for outgoing RTP
streams.
Notes:
• Continuous mode applies only for G.711
mulaw, G.711 alaw, AMR, and AMR-WB
audio codecs.
• Continuous mode does not apply in the
following scenarios:
? When a bridge transfer is in progress
? When RTP data is sent to ASR speech
engines.
? When the RTP data contains video.
• continous—Audio
silence is sent when there
is no data to send.
• vad—RTP transmissions
stop when there is no data
to send.
Default value: vad
-------------------

If I understand correctly this is a GVP side bug. But since we have already 
sold our mrcp recognition 
server solution to some guys who are using GVP I need to implement a solution 
to this case even if it is only a workaround.
So I am still waiting for any kind of offers. At worst I will have to rewrite 
the jitter side of unimrcp. I thought of just not ignoring the 
wrong timestamped packets, but it seems like it would break the rest of unimrcp 
implementation of jitter buffer because it 
seems very standard conformant and does some other checks based on timestamp 
info.

Please feel free to correct me if anything is wrong.

Btw. All the messages above are mine. I just logged in with my other account 
without noticing.
Note: I feel so lonely. Sending messages all to myself :)

Original comment by nazillie...@gmail.com on 6 May 2011 at 12:32

GoogleCodeExporter commented 9 years ago

Hi,
I've made the following 2-lines change as a workaround and the problem seems to 
be gone. I will also try to 
port when the tests are finished.

static APR_INLINE jb_result_t 
mpf_jitter_buffer_write_prepare(mpf_jitter_buffer_t *jb, apr_uint32_t ts, 
apr_uint32_t *write_ts)
{
    if(jb->write_sync) {
        jb->write_ts_offset = ts - jb->write_ts;
        jb->write_sync = 0;
    }

    *write_ts = ts - jb->write_ts_offset + jb->playout_delay_ts;
    if(*write_ts % jb->frame_ts != 0) {
        /* not frame alligned */
        JB_TRACE("RTP frame not aligned\n");

        *write_ts -= *write_ts % jb->frame_ts;

        //return JB_DISCARD_NOT_ALLIGNED;
    }

    return JB_OK;
}

just changed the timestamp of the newcoming RTP packet to the last previous 
acceptable timestamp value.
I am assuming that since the cause of the problem is the late packets having 
wrong timestamps the 
previous packet that should have the correct timestamp has not arrived(and will 
not arrive). 
Before this I had also tried not ignoring the packets leaving the timestamps 
intact. When I tried 
this change I did not observe any major problems. I just felt that letting 
wrong timestamps through may 
lead to unforeseen problems.

Also one thing I was not sure in the beginning was whether I should ignore the 
packets not satisfying the condition
(frame->type & MEDIA_FRAME_TYPE_AUDIO) == MEDIA_FRAME_TYPE_AUDIO
These check seems to indicate that no valid data could have been read from the 
network for the current time interval.
And the content of its frame are all 0's. Not ignoring this I got some 0-filled 
frames in my audio data.
So my conclusion is that: I shall really ignore the aforementioned frames. 
As I mentiond in one of the above posts they are not ignored in demorecog. If I 
am not missing an important detail,
this shall be a bug in the demerecog project.

Regards
Mert Büyüktuncay

Original comment by nazillie...@gmail.com on 9 May 2011 at 12:59

GoogleCodeExporter commented 9 years ago
By the way the GVP versions I am using are:

voice portal 8.1
mcp 8.1.310.12

Original comment by nazillie...@gmail.com on 9 May 2011 at 1:47

GoogleCodeExporter commented 9 years ago
The tests are finished and the workaround really worked.
The issue can be closed...

Original comment by nazillie...@gmail.com on 9 May 2011 at 2:37

GoogleCodeExporter commented 9 years ago
I removed the unnecessary info from the .pcap file from my first post. (updated 
the attached file)

Original comment by nazillie...@gmail.com on 16 May 2011 at 6:32

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Mert,

Thanks for the detailed analyzes. BTW, I encountered and addressed a similar 
but not exactly the same issue a few months ago in r1802. Your approach to 
adjust the timestamps instead of discarding the RTP packets looks good. Since 
the suggested modification (comment-4) has helped you workaround the problem 
and it might be helpful for others too, I'll apply the patch to the trunk.

Original comment by achalo...@gmail.com on 19 Jul 2011 at 11:39

GoogleCodeExporter commented 9 years ago
Done in r1808.

Original comment by achalo...@gmail.com on 19 Jul 2011 at 11:45

GoogleCodeExporter commented 9 years ago
Hi Arsen,
Since I was working on this case alone I could not have anybody check my 
thoughts and code so your verification of the issue was important for me.
Thanks for your response. 

Btw. the system has been deployed for over 2 months and no other problems has 
been reported.

Original comment by nazillie...@gmail.com on 27 Jul 2011 at 10:20

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This change also fixes an odd problem we discovered with DTMF packet handling 
in mpf_jitter_buffer.c. The symptom is that every other DTMF digit is ignored. 
The reason is because every other DTMF RTP packet is not frame aligned and is 
dropped. Since this change forces frame alignment of every packet it allows the 
DTMF packet to be accepted.

Our setup: we're using an 80ms audio frame size (set in mpf_codec_descriptor.h 
and with the Holly 'audiopacketsize' parameter). One DTMF press and release 
appears to take up 20ms (160 samples) in the RTP stream according to the 
timestamps. (The DTMF source actually sends 10 packets, 6 for DTMF-press, 4 for 
DTMF-release, and all with the same timestamp(!)).

Original comment by eric.wou...@gmail.com on 23 Jan 2012 at 5:20