RubySpeech is a library for constructing and parsing Text to Speech (TTS) and Automatic Speech Recognition (ASR) documents such as SSML, GRXML and NLSML. Such documents can be constructed to be processed by TTS and ASR engines, parsed as the result from such, or used in the implementation of such engines.
brew install pcre
sudo apt-get install libpcre3 libpcre3-dev
sudo yum install pcre-devel
gem install ruby_speech
RubySpeech provides a DSL for constructing SSML documents like so:
require 'ruby_speech'
speak = RubySpeech::SSML.draw do
voice gender: :male, name: 'fred' do
string "Hi, I'm Fred. The time is currently "
say_as interpret_as: 'date', format: 'dmy' do
"01/02/1960"
end
end
end
speak.to_s
becomes:
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice gender="male" name="fred">
Hi, I'm Fred. The time is currently <say-as format="dmy" interpret-as="date">01/02/1960</say-as>
</voice>
</speak>
Once your Speak
is fully prepared and you're ready to send it off for processing, you must call to_doc
on it to add the XML header:
<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice gender="male" name="fred">
Hi, I'm Fred. The time is currently <say-as format="dmy" interpret-as="date">01/02/1960</say-as>
</voice>
</speak>
You may also then need to call to_s
.
Construct a GRXML (SRGS) document like this:
require 'ruby_speech'
grammy = RubySpeech::GRXML.draw mode: :dtmf, root: 'pin' do
rule id: 'digit' do
one_of do
('0'..'9').map { |d| item { d } }
end
end
rule id: 'pin', scope: 'public' do
one_of do
item do
item repeat: '4' do
ruleref uri: '#digit'
end
"#"
end
item do
"* 9"
end
end
end
end
grammy.to_s
which becomes
<grammar xmlns="http://www.w3.org/2001/06/grammar" version="1.0" xml:lang="en-US" mode="dtmf" root="pin">
<rule id="digit">
<one-of>
<item>0</item>
<item>1</item>
<item>2</item>
<item>3</item>
<item>4</item>
<item>5</item>
<item>6</item>
<item>7</item>
<item>8</item>
<item>9</item>
</one-of>
</rule>
<rule id="pin" scope="public">
<one-of>
<item><item repeat="4"><ruleref uri="#digit"/></item>#</item>
<item>* 9</item>
</one-of>
</rule>
</grammar>
There are some grammars pre-defined which are available from the RubySpeech::GRXML::Builtins
module like so:
require 'ruby_speech'
RubySpeech::GRXML::Builtins.currency
which yields
<grammar xmlns="http://www.w3.org/2001/06/grammar" version="1.0" xml:lang="en-US" mode="dtmf" root="currency">
<rule id="currency" scope="public">
<item repeat="0-">
<ruleref uri="#digit"/>
</item>
<item>*</item>
<item repeat="2">
<ruleref uri="#digit"/>
</item>
</rule>
<rule id="digit">
<one-of>
<item>0</item>
<item>1</item>
<item>2</item>
<item>3</item>
<item>4</item>
<item>5</item>
<item>6</item>
<item>7</item>
<item>8</item>
<item>9</item>
</one-of>
</rule>
</grammar>
These grammars come from the VoiceXML specification, and can be used as indicated there (including parameterisation). They can be used just like any you would manually create, and there's nothing special about them except that they are already defined for you. A full list of available grammars can be found in the API documentation.
These grammars are also available via URI like so:
require 'ruby_speech'
RubySpeech::GRXML.from_uri('builtin:dtmf/boolean?y=3;n=4')
It is possible to match some arbitrary input against a GRXML grammar, like so:
require 'ruby_speech'
>> grammar = RubySpeech::GRXML.draw mode: :dtmf, root: 'pin' do
rule id: 'digit' do
one_of do
('0'..'9').map { |d| item { d } }
end
end
rule id: 'pin', scope: 'public' do
one_of do
item do
item repeat: '4' do
ruleref uri: '#digit'
end
"#"
end
item do
"* 9"
end
end
end
end
matcher = RubySpeech::GRXML::Matcher.new grammar
>> matcher.match '*9'
=> #<RubySpeech::GRXML::Match:0x00000100ae5d98
@mode = :dtmf,
@confidence = 1,
@utterance = "*9",
@interpretation = "*9"
>
>> matcher.match '1234#'
=> #<RubySpeech::GRXML::Match:0x00000100b7e020
@mode = :dtmf,
@confidence = 1,
@utterance = "1234#",
@interpretation = "1234#"
>
>> matcher.match '5678#'
=> #<RubySpeech::GRXML::Match:0x00000101218688
@mode = :dtmf,
@confidence = 1,
@utterance = "5678#",
@interpretation = "5678#"
>
>> matcher.match '1111#'
=> #<RubySpeech::GRXML::Match:0x000001012f69d8
@mode = :dtmf,
@confidence = 1,
@utterance = "1111#",
@interpretation = "1111#"
>
>> matcher.match '111'
=> #<RubySpeech::GRXML::NoMatch:0x00000101371660>
Natural Language Semantics Markup Language is the format used by many Speech Recognition engines and natural language processors to add semantic information to human language. RubySpeech is capable of generating and parsing such documents.
It is possible to generate an NLSML document like so:
require 'ruby_speech'
nlsml = RubySpeech::NLSML.draw grammar: 'http://flight' do
interpretation confidence: 0.6 do
input "I want to go to Pittsburgh", mode: :voice
instance do
airline do
to_city 'Pittsburgh'
end
end
end
interpretation confidence: 0.4 do
input "I want to go to Stockholm"
instance do
airline do
to_city "Stockholm"
end
end
end
end
nlsml.to_s
becomes:
<?xml version="1.0"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2" grammar="http://flight">
<interpretation confidence="0.6">
<input mode="voice">I want to go to Pittsburgh</input>
<instance>
<airline>
<to_city>Pittsburgh</to_city>
</airline>
</instance>
</interpretation>
<interpretation confidence="0.4">
<input>I want to go to Stockholm</input>
<instance>
<airline>
<to_city>Stockholm</to_city>
</airline>
</instance>
</interpretation>
</result>
It's also possible to parse an NLSML document and extract useful information from it. Taking the above example, one may do:
document = RubySpeech.parse nlsml.to_s
document.match? # => true
document.interpretations # => [
{
confidence: 0.6,
input: { mode: :voice, content: 'I want to go to Pittsburgh' },
instance: { airline: { to_city: 'Pittsburgh' } }
},
{
confidence: 0.4,
input: { content: 'I want to go to Stockholm' },
instance: { airline: { to_city: 'Stockholm' } }
}
]
document.best_interpretation # => {
confidence: 0.6,
input: { mode: :voice, content: 'I want to go to Pittsburgh' },
instance: { airline: { to_city: 'Pittsburgh' } }
}
Check out the YARD documentation for more
<voice/>
<prosody/>
<emphasis/>
<say-as/>
<break/>
<audio/>
<p/>
and <s/>
<phoneme/>
<sub/>
<mark/>
<desc/>
<item/>
<one-of/>
<rule/>
<ruleref/>
<tag/>
<token/>
<lexicon/>
<meta/>
and <metadata/>
<meta/>
and <metadata/>
<example/>
<lexicon/>
Copyright (c) 2013 Ben Langfeld. MIT licence (see LICENSE for details).