Closed GoogleCodeExporter closed 9 years ago
Issue 156 has been merged into this issue.
Original comment by torsten....@gmail.com
on 11 Jun 2013 at 1:56
New "api" module?
Wikipedia says:
----
Phonology is often distinguished from phonetics. While phonetics concerns the
physical production, acoustic transmission and perception of the sounds of
speech,[1][2] phonology describes the way sounds function within a given
language or across languages to encode meaning. For many linguists, phonetics
belongs to descriptive linguistics, and phonology to theoretical linguistics,
although establishing the phonological system of a language is necessarily an
application of theoretical principles to analysis of phonetic evidence. Note
that this distinction was not always made, particularly before the development
of the modern concept of phoneme in the mid 20th century. Some subfields of
modern phonology have a crossover with phonetics in descriptive disciplines
such as psycholinguistics and speech perception, resulting in specific areas
like articulatory phonology or laboratory phonology.
----
How to call the new module? .phonetics, .phonology, .speech? Probably
"api.phonetics" then?
Original comment by richard.eckart
on 11 Jun 2013 at 2:00
"+1" for api.phonetics
Original comment by torsten....@gmail.com
on 11 Jun 2013 at 2:08
why not call it api.speech?
it would be more generic and would cover more accurately the current intended
use
Judith
Original comment by eckle.kohler
on 11 Jun 2013 at 2:10
Some more links:
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet
"For a guide to pronouncing IPA transcriptions of English words, see IPA chart
for English dialects.
The general principle of the IPA is to provide one letter for each distinctive
sound (speech segment) although this practice is not followed if the sound
itself is complex."
see also
https://en.wikipedia.org/wiki/Phonetic_transcription
the current intended use it about transcriptions, if I understood correctly.
Should this be reflected in any way?
Original comment by eckle.kohler
on 11 Jun 2013 at 2:26
When I was thinking about "speech" I was looking for something that encompasses
both, phonetics and phonology or maybe something that was reasonably fuzzy,
because I'm not very acquainted with either discipline. I was also having
doubts, though, because "speech" may just be too broad. Like, one may expect
types for actually working with audio signals or something like that.
The two additional links appear to me to support the "api.phonetics" naming.
Original comment by richard.eckart
on 11 Jun 2013 at 4:06
What makes me feel uncomfortable with a separate api.phonetics is that it looks
like DKPro Core does some kind of phonetic analysis - as the other api packages
correspond to linguistic analysis levels as well.
Is adding an existing transcription looked up in some resource (e.g. in any
lexical resource) as an annotation already a phonetic analysis?
Original comment by eckle.kohler
on 11 Jun 2013 at 7:22
This argument should also apply to "api.sound" then ;)
Looking up the pronunciation in a dictionary is only one way in which this
could be done.
Soundex/Metaphone is quite similar to stemming (simple rule-based).
More complex annotators, e.g. for annotating French pronunciation which also
depends on context, are easily imaginable.
Original comment by torsten....@gmail.com
on 11 Jun 2013 at 7:27
I'd consider looking up transcriptions in a lexical resources as much a
phonetic analysis as looking up named entities in a gazetteer. Both are not
sophisticated approaches, but they both serve the purpose (I guess).
In both cases, it would be possible to use a generic component, say a
"DictionaryAnnotator", but in the first case it would create a "Pronunciation"
annotation and in the second case a "NamedEntity" annotation.
If there is a more sophisticated and readily available analysis tool, it might
considered integrating that.
Original comment by richard.eckart
on 11 Jun 2013 at 7:28
That should have read "I'd consider looking up transcriptions in a lexical
resources as much a phonetic analysis as I would consider looking up named
entities in a gazetteer as named entity identification."
Original comment by richard.eckart
on 11 Jun 2013 at 7:29
>> I'd consider looking up transcriptions in a lexical resources as much a
phonetic analysis as I would consider looking up named entities in a gazetteer
as named entity identification.
I agree - one could also think of components in other analysis levels that work
similarly, e.g. lemmatizers.
I was just not sure how to consider this ...
Original comment by eckle.kohler
on 11 Jun 2013 at 7:36
There is a number of phonetic encoders in the apache commons codec package:
http://commons.apache.org/proper/commons-codec/javadocs/api-release/org/apache/c
ommons/codec/StringEncoder.html
Original comment by richard.eckart
on 11 Jun 2013 at 7:38
"+1" for api.phonetics
Original comment by eckle.kohler
on 11 Jun 2013 at 7:43
>> There is a number of phonetic encoders in the apache commons codec package:
>>
http://commons.apache.org/proper/commons-codec/javadocs/api-release/org/apache/c
ommons/codec/StringEncoder.html
Yes, they are already used in "DKPro Similarity Sound" for comparing two terms
based on their pronunciation.
I think this should move to a new DKPro Core Module that wraps the apache
commons code and writes the corresponding Annotation.
DKPro Similarity will then use the new type / conversion code from Core.
Original comment by torsten....@gmail.com
on 11 Jun 2013 at 7:49
Original comment by richard.eckart
on 24 Jun 2013 at 10:46
We have the API module and the commonscodec module now. Anything left to do
here?
Original comment by richard.eckart
on 11 Aug 2013 at 5:27
I think we are done here for now.
Original comment by torsten....@gmail.com
on 11 Aug 2013 at 5:40
Original issue reported on code.google.com by
torsten....@gmail.com
on 11 Jun 2013 at 1:46