calejost / unimrcp

Automatically exported from code.google.com/p/unimrcp
Apache License 2.0
0 stars 0 forks source link

Allow DTMF events to be sent from unimrcp client #31

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I am attaching the patched mpf_rtp_stream.c file which contains added code
to allow unimrcpclient to send DTMF events. Now  the client has to Set bit
mask MEDIA_FRAME_TYPE_EVENT and fill event_frame member of the mpf_frame_t
structure.

Original issue reported on code.google.com by chaitany...@gmail.com on 9 Jun 2009 at 12:46

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks, I'll try to review this hopefully later today.

Original comment by achalo...@gmail.com on 9 Jun 2009 at 1:38

GoogleCodeExporter commented 8 years ago
Hi Chaitanya,
I reviewed the patch, just wanted to discuss the remaining issues.

1. Sending procedure and use of RTP header fields
One out-of-band DTMF event SHOULD not be sent in one RTP packet.
Suppose you want to send DTMF "1" and the duration of this event is 100msec (800
samples in case of 8kHz) at timestamp = 2400.
RTP stream should look like

seqnum timestamp marker duration end-of-event
1        2400     1       160        0
2        2400     0       320        0
3        2400     0       480        0
4        2400     0       640        0
5        2400     0       800        1
6        2400     0       800        1
7        2400     0       800        1

Please note, RTP marker bit MUST be set on the first packet of the event, while 
the
last packet MUST have end-of-event bit set and the last packet SHOULD be
retransmitted. Sequence number MUST be increased, but RTP timestamp MUST be the 
same
http://tools.ietf.org/html/rfc4733#section-2.2

2. Also consider that audio stream and named-events can be sent simultaneously.
That's why I defined bit masks for them. However this is very rarely used and 
usually
either audio stream or named-event is transmitted at a time.

3. What if answer from server contains no "telephone-event" in the codec list, 
but
user is trying to send named-events. What should we do in this case? Drop those
events or use some default payload type for them.

Original comment by achalo...@gmail.com on 10 Jun 2009 at 6:27

GoogleCodeExporter commented 8 years ago
Hi Arsen,
Thanks for the detailed code review. Shows how much I missed while coding. But
actually I did that to just get something working first and then improve one by 
one.
For point one, how do you suggest we should proceed?Should I maintain a buffer 
in
transmit?
I will look into point 2.
For point 3, I think first of all a client should not send a named event if it 
did
not get a "telephone-event" in the codec list during SDP negotiation, as the 
whole
purpose of negotiation is to identify the capabilities of the server and the 
client.
In case the client sends it, then I guess the server should drop it, or maybe we
could drop it as well.What do you say?

Original comment by chaitany...@gmail.com on 10 Jun 2009 at 7:48

GoogleCodeExporter commented 8 years ago
Clearly it's not possible to implement all the required stuff in a day and I 
see no
problem over there, at least we get started.
I'll try to look into this more closely later.
Thanks.

Original comment by achalo...@gmail.com on 10 Jun 2009 at 8:18

GoogleCodeExporter commented 8 years ago
Hi Chaitanya,
My thoughts regarding point 1 and also the user/application level interface.
The typical use case is: user wants to generate/send a DTMF event with the 
default or
specified duration.
The other use case is: user wants to start DTMF event transmission, but doesn't 
know
the duration of the event by that time. Suppose, user pushes a button, and the 
event
transmission should be started and continued till he releases that button. I 
believe
you are interested mostly in the first case, thus we can implement only that 
case
now, but implementation and interface should assume both cases.

Anyway, the event, which generation/transmission is in progress, should be 
stored in
transmitter and continuously processed.

Original comment by achalo...@gmail.com on 11 Jun 2009 at 7:32

GoogleCodeExporter commented 8 years ago
Hi Arsen,
As you said rightly, I am now looking at the first use case only.So for that 
can't we
use something like :

        frame->type |= MEDIA_FRAME_TYPE_EVENT;
        char string[5];
    sprintf( string, "%c", fr->subclass );
    if(strcmp(string,"#")==0)
        frame->event_frame.event_id=11;
    else if(strcmp(string,"*")==0)
        frame->event_frame.event_id=10;
    else
        frame->event_frame.event_id=atoi(string);
    frame->event_frame.event_id=atoi(string);
    frame->event_frame.volume=10;
    frame->event_frame.edge=1;
    frame->event_frame.duration=1000;

I have been sort of using this in one of my applications and it seems to be 
working well.
Should we put this in a user level function which users can call?If so where do 
you
think is the right place to add this function?
Thanks.
-Chaitanya

Original comment by chaitany...@gmail.com on 4 Aug 2009 at 9:34

GoogleCodeExporter commented 8 years ago
Hi,
thanks for developing DTMF support which is essential in IVR applications. I am
interested in server-side DTMF, though, let me suggest better character <=> 
event_id
translation:

        frame->type |= MEDIA_FRAME_TYPE_EVENT;
        if ((fr->subclass >= '0') && (fr->subclass <= '9'))
                frame->event_frame.event_id = fr->subclass - '0';
        else if (fr->subclass == '*')
                frame->event_frame.event_id = 10;
        else if (fr->subclass == '#')
                frame->event_frame.event_id = 11;
        else if ((fr->subclass >= 'A') && (fr->subclass <= 'D'))
                frame->event_frame.event_id = 12 + fr->subclass - 'A';
        else
                frame->event_frame.event_id = 255; /* Invalid DTMF event */
        /* ... */

Cheers
- Vali

Original comment by tomas.valenta@speechtech.cz on 4 Aug 2009 at 12:28

GoogleCodeExporter commented 8 years ago
Hi,

Chaitanya, in this case I'll try to apply what we have for now. I mean the
modifications you made so far.

Vali, I agree with you. Just wonder if you could provide two functions, which 
simply
implement character <=> event_id translations. I'll be happy to apply them.

Thank you both for your input.

Original comment by achalo...@gmail.com on 4 Aug 2009 at 1:26

GoogleCodeExporter commented 8 years ago
Hi Arsen,
no problem. I am happy I can help a bit.

apr_uint32_t dtmf_char2event_id(const char dtmf_char)
{
    if ((dtmf_char >= '0') && (dtmf_char <= '9'))
        return dtmf_char - '0';
    else if (dtmf_char == '*')
        return 10;
    else if (dtmf_char == '#')
        return 11;
    else if ((dtmf_char >= 'A') && (dtmf_char <= 'D'))
        return 12 + dtmf_char - 'A';
    else
        return 255; /* Invalid DTMF event */
}

char event_id2dtmf_char(const apr_uint32_t event_id)
{
    if (event_id <= 9)
        return '0' + (char)event_id;
    else if (event_id == 10)
        return '*';
    else if (event_id == 11)
        return '#';
    else if (event_id <= 15)
        return 'A' + (char)event_id;
    else
        return 0; /* Not a DTMF event */
}

Let me also note something to plugin DTMF processing. I can't wait receiving 
DTMF
digits :-) because I have already implemented processing in my plugin. Now it 
looks
like DTMF digits will be received as named events in audio stream's 
write_frame. IMO
this is not very happy solution because named events contain needless much
information such as volume, duration etc. And worse, plugin implementer 
receives the
same digit each frame throughout its duration. I find this behaviour 
unnecessarily
low-level, but the final implementation is, of course, up to you.

Maybe I should have written the last note to another thread.
Cheers
- Vali

Original comment by tomas.valenta@speechtech.cz on 4 Aug 2009 at 3:40

GoogleCodeExporter commented 8 years ago
Aw, sorry for mistake. This should be right:

char event_id2dtmf_char(const apr_uint32_t event_id)
{
    if (event_id <= 9)
        return '0' + (char)event_id;
    else if (event_id == 10)
        return '*';
    else if (event_id == 11)
        return '#';
    else if (event_id <= 15)
        return 'A' + (char)event_id - 12;
    else
        return 0; /* Not a DTMF event */
}

Original comment by tomas.valenta@speechtech.cz on 4 Aug 2009 at 3:48

GoogleCodeExporter commented 8 years ago
Chaitanya, Vali thanks for the suggestions and patches. Both are applied to the 
trunk
r1089 and r1090.

Original comment by achalo...@gmail.com on 17 Aug 2009 at 7:54

GoogleCodeExporter commented 8 years ago
Just an update to clarify current status of this issue.

Sending and receiving procedures must be further enhanced/implemented.

Original comment by achalo...@gmail.com on 18 Oct 2009 at 1:33

GoogleCodeExporter commented 8 years ago
Watching the discussions I can see that I am not the only one that would like 
UniMRCP
to support DTMF. Having read carefully and understood the current state I think 
I
would be able to write a client class, say mpf_dtmf_generator, generating
frames/packets according to RFC4733 (or audio tones alternatively). I am more
interested in server-side though. If the server supported delivering named 
events to
the engine channel's stream_write(), then I could create mpf_dtmf_detector.

Now it is possible to send either event or audio (not both at a time) from the
client, right? My idea is that the generator would have a function
apt_bool_t stream_read(generator, frame)
returning TRUE if the frame was generated by the detector (sending digits from 
queue)
or FALSE if there are no digits in queue.

So is this the way we should go? Are you interested?

Original comment by tomas.valenta@speechtech.cz on 18 Oct 2009 at 4:47

GoogleCodeExporter commented 8 years ago
Hi Vali,

I just welcome your initiative and this would be perfect way to go.
Moreover, if you start implementing mpf_generator and mpf_detector entities, 
I'll try
to go along with you and implement missing functionality in the core.

Original comment by achalo...@gmail.com on 19 Oct 2009 at 9:55

GoogleCodeExporter commented 8 years ago
Hi everybody,

I have created the top-level classes for dealing with DTMF. Actually I should 
say
they are a draft. They work fine with in-band digits, but not with out-of-band 
ones,
since it is not supported in MPF yet. See the attached headers for docs. The 
idea is
as follows:

in demo_recog_application.c:

#include "mpf_dtmf_generator.h"

static apt_bool_t recog_app_stream_open(mpf_audio_stream_t *stream, mpf_codec_t 
*codec)
{
    dtmf = mpf_dtmf_generator_create(stream, pool);
}

static apt_bool_t recog_app_stream_read(mpf_audio_stream_t *stream, mpf_frame_t 
*frame)
{
    if (mpf_dtmf_generator_sending(dtmf)) {
        if (mpf_dtmf_generator_put_frame(dtmf, frame))
            return TRUE;
        /* Inter-digit silence */
        frame->type |= MEDIA_FRAME_TYPE_AUDIO;
        memset(frame->codec_frame.buffer, 0, frame->codec_frame.size);
        return TRUE;
    }
    /* Stream other audio */
}

handle_recognition_in_progress(...)
{
    mpf_dtmf_generator_enqueue(dtmf, "0123456789*#ABCD");
}

And in demo_recog_engine.c:

#include "mpf_dtmf_detector.h"

static apt_bool_t demo_recog_channel_open(mrcp_engine_channel_t *channel)
{
    dtmf = mpf_dtmf_detector_create(channel->termination->audio_stream, channel->pool);
}

static apt_bool_t demo_recog_stream_write(mpf_audio_stream_t *stream, const
mpf_frame_t *frame)
{
    mpf_dtmf_detector_get_frame(dtmf, frame);
    char c = mpf_dtmf_detector_digit_get(dtmf);
    if (c) printf("Got digit: %c\n", c);
}

As I said, in-band DTMF works fine, but with out-of-band DTMF there are 
following
problems:
 - On generator side:
   - It is not possible to set RTP marker bit.
   - RTP timestamp should not grow for single event.
   - How to get telephone-event's ptime?
   - mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
 - On detector side:
   - Receive the events at all.
   - Get RTP timestamp for the event.

As I said, this is just a draft of the concept, so please read the sources in
attachment and leave comments. In the ZIP archive attached, there is also a 
pcap with
out-of-band packtes with the problems described above.

Cheers
- Vali

Original comment by tomas.valenta@speechtech.cz on 25 Oct 2009 at 11:06

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Vali,

I should confirm the draft looks VERY promising. The implementation and outlined
problems (missing functionality) give more confidence over the concepts, 
because we
seem to be on the same track. 

 - On generator side:
   - It is not possible to set RTP marker bit.
I'll add marker flag to the mpf_frame_t. That marker should allow to pass along
significant events and these events shouldn't be limited only with DTMFs.
For instance, START_OF_NAMED_EVENT, END_OF_UTTERANCE ...

   - RTP timestamp should not grow for single event.
It's clear.

   - How to get telephone-event's ptime?
By default, ptime of rx_descriptor should be used. Probably it'd be reasonable 
to
make ptime (updates for events) configurable. I think, users may want/need to 
not
only configure period of updates, but also to be able to enable/disable the 
updates
at all. Should be considered for future ...

   - mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
I'll look into this later.

 - On detector side:
   - Receive the events at all.
OK
   - Get RTP timestamp for the event.
Not only timestamp, but marker would be required
See long-lasting events
http://tools.ietf.org/html/rfc4733#section-2.5.1.3
Anyway, we can support it later, just info to consider, but both timestamp and 
marker
should be passed along

So you've given it a perfect start and as I promised, I'll try to go along and
implement outlined above functionality. Most probably I'll dedicate a couple of 
days
to this issue during the next week. So please wait a bit.
Thanks!

Original comment by achalo...@gmail.com on 26 Oct 2009 at 4:14

GoogleCodeExporter commented 8 years ago
 - On generator side:
   - It is not possible to set RTP marker bit.
I'll add marker flag to the mpf_frame_t. That marker should allow to pass along
significant events and these events shouldn't be limited only with DTMFs.
For instance, START_OF_NAMED_EVENT, END_OF_UTTERANCE ...

Now I do not fully understand. I have not heard of e.g. END_OF_UTTERANCE
telephone-event. You plan those events to be MPF internal events 
indicating/changing
named-event's state? If not, I should not call the classes dtmf_generator and
dtmf_detector. Well, there should be another layer actually, because almost only
DTMFs can be sent in-band as well.

   - RTP timestamp should not grow for single event.
It's clear.

   - How to get telephone-event's ptime?
By default, ptime of rx_descriptor should be used. Probably it'd be reasonable 
to
make ptime (updates for events) configurable. I think, users may want/need to 
not
only configure period of updates, but also to be able to enable/disable the 
updates
at all. Should be considered for future ...

Clear. But honestly I cannot recall any non-DTMF (maybe MF) events used in IVR 
apps.
Packet with event should be used every ptime. How can I get rx_descriptor's 
ptime?

   - mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
I'll look into this later.

 - On detector side:
   - Receive the events at all.
OK
   - Get RTP timestamp for the event.
Not only timestamp, but marker would be required
See long-lasting events
http://tools.ietf.org/html/rfc4733#section-2.5.1.3
Anyway, we can support it later, just info to consider, but both timestamp and 
marker
should be passed along

Clear, I do not plan to care about marker in detector, because the marked 
packet may
get lost. But anyway I agree it is a must.

Thank you for ideas and support.
- Vali

Original comment by tomas.valenta@speechtech.cz on 26 Oct 2009 at 10:01

GoogleCodeExporter commented 8 years ago
Below is clarification on marker, more to come later.

As you know, marker bit in RTP header is used not only in case of named events.
Marker also indicates start of a new talkspurt. Typically RTP stream consists of
voice activity and inactivity segments. Currently I implicitly detect these 
segments
in rtp_transmit. However, I think it makes sense to add generic marker in 
mpf_frame,
which will allow user apps to explicitly indicate start-of-talkspurt (or
start-of-utterance, start-of-event) and the same way end-of-talkspurt (or
end-of-utterance, end-of-event), when possible. There can be more flags/markers 
in
the future. If marker isn't specified everything should work as is. So this 
marker
should serve well for named events, and can be used for other purposes as well.

On the receiver side you definitely shouldn't rely on marker, because it indeed 
may
be lost. However, presence of that marker may help a lot in analyzes. Actually,
receiver part isn't as trivial as it may seen, but everything seems to be 
achievable.

Original comment by achalo...@gmail.com on 27 Oct 2009 at 10:53

GoogleCodeExporter commented 8 years ago
See http://groups.google.com/group/unimrcp/browse_thread/thread/8e773f260c56c34e

Original comment by achalo...@gmail.com on 9 Nov 2009 at 7:27

GoogleCodeExporter commented 8 years ago

Original comment by achalo...@gmail.com on 27 Apr 2010 at 12:01