Closed GoogleCodeExporter closed 8 years ago
Thanks, I'll try to review this hopefully later today.
Original comment by achalo...@gmail.com
on 9 Jun 2009 at 1:38
Hi Chaitanya,
I reviewed the patch, just wanted to discuss the remaining issues.
1. Sending procedure and use of RTP header fields
One out-of-band DTMF event SHOULD not be sent in one RTP packet.
Suppose you want to send DTMF "1" and the duration of this event is 100msec (800
samples in case of 8kHz) at timestamp = 2400.
RTP stream should look like
seqnum timestamp marker duration end-of-event
1 2400 1 160 0
2 2400 0 320 0
3 2400 0 480 0
4 2400 0 640 0
5 2400 0 800 1
6 2400 0 800 1
7 2400 0 800 1
Please note, RTP marker bit MUST be set on the first packet of the event, while
the
last packet MUST have end-of-event bit set and the last packet SHOULD be
retransmitted. Sequence number MUST be increased, but RTP timestamp MUST be the
same
http://tools.ietf.org/html/rfc4733#section-2.2
2. Also consider that audio stream and named-events can be sent simultaneously.
That's why I defined bit masks for them. However this is very rarely used and
usually
either audio stream or named-event is transmitted at a time.
3. What if answer from server contains no "telephone-event" in the codec list,
but
user is trying to send named-events. What should we do in this case? Drop those
events or use some default payload type for them.
Original comment by achalo...@gmail.com
on 10 Jun 2009 at 6:27
Hi Arsen,
Thanks for the detailed code review. Shows how much I missed while coding. But
actually I did that to just get something working first and then improve one by
one.
For point one, how do you suggest we should proceed?Should I maintain a buffer
in
transmit?
I will look into point 2.
For point 3, I think first of all a client should not send a named event if it
did
not get a "telephone-event" in the codec list during SDP negotiation, as the
whole
purpose of negotiation is to identify the capabilities of the server and the
client.
In case the client sends it, then I guess the server should drop it, or maybe we
could drop it as well.What do you say?
Original comment by chaitany...@gmail.com
on 10 Jun 2009 at 7:48
Clearly it's not possible to implement all the required stuff in a day and I
see no
problem over there, at least we get started.
I'll try to look into this more closely later.
Thanks.
Original comment by achalo...@gmail.com
on 10 Jun 2009 at 8:18
Hi Chaitanya,
My thoughts regarding point 1 and also the user/application level interface.
The typical use case is: user wants to generate/send a DTMF event with the
default or
specified duration.
The other use case is: user wants to start DTMF event transmission, but doesn't
know
the duration of the event by that time. Suppose, user pushes a button, and the
event
transmission should be started and continued till he releases that button. I
believe
you are interested mostly in the first case, thus we can implement only that
case
now, but implementation and interface should assume both cases.
Anyway, the event, which generation/transmission is in progress, should be
stored in
transmitter and continuously processed.
Original comment by achalo...@gmail.com
on 11 Jun 2009 at 7:32
Hi Arsen,
As you said rightly, I am now looking at the first use case only.So for that
can't we
use something like :
frame->type |= MEDIA_FRAME_TYPE_EVENT;
char string[5];
sprintf( string, "%c", fr->subclass );
if(strcmp(string,"#")==0)
frame->event_frame.event_id=11;
else if(strcmp(string,"*")==0)
frame->event_frame.event_id=10;
else
frame->event_frame.event_id=atoi(string);
frame->event_frame.event_id=atoi(string);
frame->event_frame.volume=10;
frame->event_frame.edge=1;
frame->event_frame.duration=1000;
I have been sort of using this in one of my applications and it seems to be
working well.
Should we put this in a user level function which users can call?If so where do
you
think is the right place to add this function?
Thanks.
-Chaitanya
Original comment by chaitany...@gmail.com
on 4 Aug 2009 at 9:34
Hi,
thanks for developing DTMF support which is essential in IVR applications. I am
interested in server-side DTMF, though, let me suggest better character <=>
event_id
translation:
frame->type |= MEDIA_FRAME_TYPE_EVENT;
if ((fr->subclass >= '0') && (fr->subclass <= '9'))
frame->event_frame.event_id = fr->subclass - '0';
else if (fr->subclass == '*')
frame->event_frame.event_id = 10;
else if (fr->subclass == '#')
frame->event_frame.event_id = 11;
else if ((fr->subclass >= 'A') && (fr->subclass <= 'D'))
frame->event_frame.event_id = 12 + fr->subclass - 'A';
else
frame->event_frame.event_id = 255; /* Invalid DTMF event */
/* ... */
Cheers
- Vali
Original comment by tomas.valenta@speechtech.cz
on 4 Aug 2009 at 12:28
Hi,
Chaitanya, in this case I'll try to apply what we have for now. I mean the
modifications you made so far.
Vali, I agree with you. Just wonder if you could provide two functions, which
simply
implement character <=> event_id translations. I'll be happy to apply them.
Thank you both for your input.
Original comment by achalo...@gmail.com
on 4 Aug 2009 at 1:26
Hi Arsen,
no problem. I am happy I can help a bit.
apr_uint32_t dtmf_char2event_id(const char dtmf_char)
{
if ((dtmf_char >= '0') && (dtmf_char <= '9'))
return dtmf_char - '0';
else if (dtmf_char == '*')
return 10;
else if (dtmf_char == '#')
return 11;
else if ((dtmf_char >= 'A') && (dtmf_char <= 'D'))
return 12 + dtmf_char - 'A';
else
return 255; /* Invalid DTMF event */
}
char event_id2dtmf_char(const apr_uint32_t event_id)
{
if (event_id <= 9)
return '0' + (char)event_id;
else if (event_id == 10)
return '*';
else if (event_id == 11)
return '#';
else if (event_id <= 15)
return 'A' + (char)event_id;
else
return 0; /* Not a DTMF event */
}
Let me also note something to plugin DTMF processing. I can't wait receiving
DTMF
digits :-) because I have already implemented processing in my plugin. Now it
looks
like DTMF digits will be received as named events in audio stream's
write_frame. IMO
this is not very happy solution because named events contain needless much
information such as volume, duration etc. And worse, plugin implementer
receives the
same digit each frame throughout its duration. I find this behaviour
unnecessarily
low-level, but the final implementation is, of course, up to you.
Maybe I should have written the last note to another thread.
Cheers
- Vali
Original comment by tomas.valenta@speechtech.cz
on 4 Aug 2009 at 3:40
Aw, sorry for mistake. This should be right:
char event_id2dtmf_char(const apr_uint32_t event_id)
{
if (event_id <= 9)
return '0' + (char)event_id;
else if (event_id == 10)
return '*';
else if (event_id == 11)
return '#';
else if (event_id <= 15)
return 'A' + (char)event_id - 12;
else
return 0; /* Not a DTMF event */
}
Original comment by tomas.valenta@speechtech.cz
on 4 Aug 2009 at 3:48
Chaitanya, Vali thanks for the suggestions and patches. Both are applied to the
trunk
r1089 and r1090.
Original comment by achalo...@gmail.com
on 17 Aug 2009 at 7:54
Just an update to clarify current status of this issue.
Sending and receiving procedures must be further enhanced/implemented.
Original comment by achalo...@gmail.com
on 18 Oct 2009 at 1:33
Watching the discussions I can see that I am not the only one that would like
UniMRCP
to support DTMF. Having read carefully and understood the current state I think
I
would be able to write a client class, say mpf_dtmf_generator, generating
frames/packets according to RFC4733 (or audio tones alternatively). I am more
interested in server-side though. If the server supported delivering named
events to
the engine channel's stream_write(), then I could create mpf_dtmf_detector.
Now it is possible to send either event or audio (not both at a time) from the
client, right? My idea is that the generator would have a function
apt_bool_t stream_read(generator, frame)
returning TRUE if the frame was generated by the detector (sending digits from
queue)
or FALSE if there are no digits in queue.
So is this the way we should go? Are you interested?
Original comment by tomas.valenta@speechtech.cz
on 18 Oct 2009 at 4:47
Hi Vali,
I just welcome your initiative and this would be perfect way to go.
Moreover, if you start implementing mpf_generator and mpf_detector entities,
I'll try
to go along with you and implement missing functionality in the core.
Original comment by achalo...@gmail.com
on 19 Oct 2009 at 9:55
Hi everybody,
I have created the top-level classes for dealing with DTMF. Actually I should
say
they are a draft. They work fine with in-band digits, but not with out-of-band
ones,
since it is not supported in MPF yet. See the attached headers for docs. The
idea is
as follows:
in demo_recog_application.c:
#include "mpf_dtmf_generator.h"
static apt_bool_t recog_app_stream_open(mpf_audio_stream_t *stream, mpf_codec_t
*codec)
{
dtmf = mpf_dtmf_generator_create(stream, pool);
}
static apt_bool_t recog_app_stream_read(mpf_audio_stream_t *stream, mpf_frame_t
*frame)
{
if (mpf_dtmf_generator_sending(dtmf)) {
if (mpf_dtmf_generator_put_frame(dtmf, frame))
return TRUE;
/* Inter-digit silence */
frame->type |= MEDIA_FRAME_TYPE_AUDIO;
memset(frame->codec_frame.buffer, 0, frame->codec_frame.size);
return TRUE;
}
/* Stream other audio */
}
handle_recognition_in_progress(...)
{
mpf_dtmf_generator_enqueue(dtmf, "0123456789*#ABCD");
}
And in demo_recog_engine.c:
#include "mpf_dtmf_detector.h"
static apt_bool_t demo_recog_channel_open(mrcp_engine_channel_t *channel)
{
dtmf = mpf_dtmf_detector_create(channel->termination->audio_stream, channel->pool);
}
static apt_bool_t demo_recog_stream_write(mpf_audio_stream_t *stream, const
mpf_frame_t *frame)
{
mpf_dtmf_detector_get_frame(dtmf, frame);
char c = mpf_dtmf_detector_digit_get(dtmf);
if (c) printf("Got digit: %c\n", c);
}
As I said, in-band DTMF works fine, but with out-of-band DTMF there are
following
problems:
- On generator side:
- It is not possible to set RTP marker bit.
- RTP timestamp should not grow for single event.
- How to get telephone-event's ptime?
- mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
- On detector side:
- Receive the events at all.
- Get RTP timestamp for the event.
As I said, this is just a draft of the concept, so please read the sources in
attachment and leave comments. In the ZIP archive attached, there is also a
pcap with
out-of-band packtes with the problems described above.
Cheers
- Vali
Original comment by tomas.valenta@speechtech.cz
on 25 Oct 2009 at 11:06
Attachments:
Hi Vali,
I should confirm the draft looks VERY promising. The implementation and outlined
problems (missing functionality) give more confidence over the concepts,
because we
seem to be on the same track.
- On generator side:
- It is not possible to set RTP marker bit.
I'll add marker flag to the mpf_frame_t. That marker should allow to pass along
significant events and these events shouldn't be limited only with DTMFs.
For instance, START_OF_NAMED_EVENT, END_OF_UTTERANCE ...
- RTP timestamp should not grow for single event.
It's clear.
- How to get telephone-event's ptime?
By default, ptime of rx_descriptor should be used. Probably it'd be reasonable
to
make ptime (updates for events) configurable. I think, users may want/need to
not
only configure period of updates, but also to be able to enable/disable the
updates
at all. Should be considered for future ...
- mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
I'll look into this later.
- On detector side:
- Receive the events at all.
OK
- Get RTP timestamp for the event.
Not only timestamp, but marker would be required
See long-lasting events
http://tools.ietf.org/html/rfc4733#section-2.5.1.3
Anyway, we can support it later, just info to consider, but both timestamp and
marker
should be passed along
So you've given it a perfect start and as I promised, I'll try to go along and
implement outlined above functionality. Most probably I'll dedicate a couple of
days
to this issue during the next week. So please wait a bit.
Thanks!
Original comment by achalo...@gmail.com
on 26 Oct 2009 at 4:14
- On generator side:
- It is not possible to set RTP marker bit.
I'll add marker flag to the mpf_frame_t. That marker should allow to pass along
significant events and these events shouldn't be limited only with DTMFs.
For instance, START_OF_NAMED_EVENT, END_OF_UTTERANCE ...
Now I do not fully understand. I have not heard of e.g. END_OF_UTTERANCE
telephone-event. You plan those events to be MPF internal events
indicating/changing
named-event's state? If not, I should not call the classes dtmf_generator and
dtmf_detector. Well, there should be another layer actually, because almost only
DTMFs can be sent in-band as well.
- RTP timestamp should not grow for single event.
It's clear.
- How to get telephone-event's ptime?
By default, ptime of rx_descriptor should be used. Probably it'd be reasonable
to
make ptime (updates for events) configurable. I think, users may want/need to
not
only configure period of updates, but also to be able to enable/disable the
updates
at all. Should be considered for future ...
Clear. But honestly I cannot recall any non-DTMF (maybe MF) events used in IVR
apps.
Packet with event should be used every ptime. How can I get rx_descriptor's
ptime?
- mpf_audio_stream_t::rx_event_descriptor should be populated to get
telephone-event's frequency.
I'll look into this later.
- On detector side:
- Receive the events at all.
OK
- Get RTP timestamp for the event.
Not only timestamp, but marker would be required
See long-lasting events
http://tools.ietf.org/html/rfc4733#section-2.5.1.3
Anyway, we can support it later, just info to consider, but both timestamp and
marker
should be passed along
Clear, I do not plan to care about marker in detector, because the marked
packet may
get lost. But anyway I agree it is a must.
Thank you for ideas and support.
- Vali
Original comment by tomas.valenta@speechtech.cz
on 26 Oct 2009 at 10:01
Below is clarification on marker, more to come later.
As you know, marker bit in RTP header is used not only in case of named events.
Marker also indicates start of a new talkspurt. Typically RTP stream consists of
voice activity and inactivity segments. Currently I implicitly detect these
segments
in rtp_transmit. However, I think it makes sense to add generic marker in
mpf_frame,
which will allow user apps to explicitly indicate start-of-talkspurt (or
start-of-utterance, start-of-event) and the same way end-of-talkspurt (or
end-of-utterance, end-of-event), when possible. There can be more flags/markers
in
the future. If marker isn't specified everything should work as is. So this
marker
should serve well for named events, and can be used for other purposes as well.
On the receiver side you definitely shouldn't rely on marker, because it indeed
may
be lost. However, presence of that marker may help a lot in analyzes. Actually,
receiver part isn't as trivial as it may seen, but everything seems to be
achievable.
Original comment by achalo...@gmail.com
on 27 Oct 2009 at 10:53
See http://groups.google.com/group/unimrcp/browse_thread/thread/8e773f260c56c34e
Original comment by achalo...@gmail.com
on 9 Nov 2009 at 7:27
Original comment by achalo...@gmail.com
on 27 Apr 2010 at 12:01
Original issue reported on code.google.com by
chaitany...@gmail.com
on 9 Jun 2009 at 12:46Attachments: