Closed reckart closed 9 years ago
Issue 156 has been merged into this issue.
Original issue reported on code.google.com by torsten.zesch
on 2013-06-11 13:56:07
New "api" module?
Wikipedia says:
----
Phonology is often distinguished from phonetics. While phonetics concerns the physical
production, acoustic transmission and perception of the sounds of speech,[1][2] phonology
describes the way sounds function within a given language or across languages to encode
meaning. For many linguists, phonetics belongs to descriptive linguistics, and phonology
to theoretical linguistics, although establishing the phonological system of a language
is necessarily an application of theoretical principles to analysis of phonetic evidence.
Note that this distinction was not always made, particularly before the development
of the modern concept of phoneme in the mid 20th century. Some subfields of modern
phonology have a crossover with phonetics in descriptive disciplines such as psycholinguistics
and speech perception, resulting in specific areas like articulatory phonology or laboratory
phonology.
----
How to call the new module? .phonetics, .phonology, .speech? Probably "api.phonetics"
then?
Original issue reported on code.google.com by richard.eckart
on 2013-06-11 14:00:55
"+1" for api.phonetics
Original issue reported on code.google.com by torsten.zesch
on 2013-06-11 14:08:59
why not call it api.speech?
it would be more generic and would cover more accurately the current intended use
Judith
Original issue reported on code.google.com by eckle.kohler
on 2013-06-11 14:10:39
Some more links:
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet
"For a guide to pronouncing IPA transcriptions of English words, see IPA chart for
English dialects.
The general principle of the IPA is to provide one letter for each distinctive sound
(speech segment) although this practice is not followed if the sound itself is complex."
see also
https://en.wikipedia.org/wiki/Phonetic_transcription
the current intended use it about transcriptions, if I understood correctly.
Should this be reflected in any way?
Original issue reported on code.google.com by eckle.kohler
on 2013-06-11 14:26:36
When I was thinking about "speech" I was looking for something that encompasses both,
phonetics and phonology or maybe something that was reasonably fuzzy, because I'm not
very acquainted with either discipline. I was also having doubts, though, because "speech"
may just be too broad. Like, one may expect types for actually working with audio signals
or something like that.
The two additional links appear to me to support the "api.phonetics" naming.
Original issue reported on code.google.com by richard.eckart
on 2013-06-11 16:06:31
What makes me feel uncomfortable with a separate api.phonetics is that it looks like
DKPro Core does some kind of phonetic analysis - as the other api packages correspond
to linguistic analysis levels as well.
Is adding an existing transcription looked up in some resource (e.g. in any lexical
resource) as an annotation already a phonetic analysis?
Original issue reported on code.google.com by eckle.kohler
on 2013-06-11 19:22:00
This argument should also apply to "api.sound" then ;)
Looking up the pronunciation in a dictionary is only one way in which this could be
done.
Soundex/Metaphone is quite similar to stemming (simple rule-based).
More complex annotators, e.g. for annotating French pronunciation which also depends
on context, are easily imaginable.
Original issue reported on code.google.com by torsten.zesch
on 2013-06-11 19:27:50
I'd consider looking up transcriptions in a lexical resources as much a phonetic analysis
as looking up named entities in a gazetteer. Both are not sophisticated approaches,
but they both serve the purpose (I guess).
In both cases, it would be possible to use a generic component, say a "DictionaryAnnotator",
but in the first case it would create a "Pronunciation" annotation and in the second
case a "NamedEntity" annotation.
If there is a more sophisticated and readily available analysis tool, it might considered
integrating that.
Original issue reported on code.google.com by richard.eckart
on 2013-06-11 19:28:15
That should have read "I'd consider looking up transcriptions in a lexical resources
as much a phonetic analysis as I would consider looking up named entities in a gazetteer
as named entity identification."
Original issue reported on code.google.com by richard.eckart
on 2013-06-11 19:29:24
>> I'd consider looking up transcriptions in a lexical resources as much a phonetic
analysis as I would consider looking up named entities in a gazetteer as named entity
identification.
I agree - one could also think of components in other analysis levels that work similarly,
e.g. lemmatizers.
I was just not sure how to consider this ...
Original issue reported on code.google.com by eckle.kohler
on 2013-06-11 19:36:24
There is a number of phonetic encoders in the apache commons codec package:
http://commons.apache.org/proper/commons-codec/javadocs/api-release/org/apache/commons/codec/StringEncoder.html
Original issue reported on code.google.com by richard.eckart
on 2013-06-11 19:38:04
"+1" for api.phonetics
Original issue reported on code.google.com by eckle.kohler
on 2013-06-11 19:43:46
>> There is a number of phonetic encoders in the apache commons codec package:
>> http://commons.apache.org/proper/commons-codec/javadocs/api-release/org/apache/commons/codec/StringEncoder.html
Yes, they are already used in "DKPro Similarity Sound" for comparing two terms based
on their pronunciation.
I think this should move to a new DKPro Core Module that wraps the apache commons code
and writes the corresponding Annotation.
DKPro Similarity will then use the new type / conversion code from Core.
Original issue reported on code.google.com by torsten.zesch
on 2013-06-11 19:49:18
(No text was entered with this change)
Original issue reported on code.google.com by richard.eckart
on 2013-06-24 22:46:26
We have the API module and the commonscodec module now. Anything left to do here?
Original issue reported on code.google.com by richard.eckart
on 2013-08-11 17:27:38
I think we are done here for now.
Original issue reported on code.google.com by torsten.zesch
on 2013-08-11 17:40:24
Original issue reported on code.google.com by
torsten.zesch
on 2013-06-11 13:46:51