Closed GoogleCodeExporter closed 9 years ago
When Automatic Capitalization is turned on, Adapt It's design is for handling
*regular* case "correspondences" between the source and target language. As
explained
in Issue #20, since Adapt It is designed to adapt between *similar* languages,
Adapt
It only tries to duplicate the case of the source text into the target text.
This
works for many languages (especially those that have new orthographies), but
Adapt
It's Auto Capitalization feature will not work consistently between two
languages
such as German and English which have long established different capitalization
rules, and those rules do not afford a one-to-one relationship between the two
languages. When Auto Capitalization is ON, Adapt It attempts to maintain its KB
with
lower case forms, and changes the target forms to upper case when the source
language
uses upper case. This is why you are seeing both 'vater' and 'Vater' in the KB
when
the Automatic Capitalization is turned on. When translating from English to
German,
you should turn off Automatic Capitalization and simply allow Adapt It to
associate
source and target words even if they have different capitalization in different
contexts. No harm is done. It doesn't matter to Adapt It that its KB gets
populated
with some words that have pairs - one with upper case and one with lower case
spellings.
Original comment by adaptitbill@gmail.com
on 12 Mar 2009 at 11:39
What makes it cumbersome to switch automatic capitalization off is, that
'Adapit It
then distinguishes between two phrase forms (e.g. English -> German)
- a and
- A
Both have in German about 5 meanings (ein, einer, eine .....)
which have to be entered in both for 'a' and for 'A'.
And this would enlarge the KB quite a bit as many of the English words can show
up in
Upper case after a punctuation mark.
If capitalization rules similar as
- English Upper case adjectives are rendered as lower case in German and
- German Upper case nouns are rendered as lower case in English
are more common between languages (I do not have an idea of this):
-> It would be nice to be able to tell Adapt It that some words of the target
language do not follow automatic capitalization (except after some punctuation
marks)
but stay in the form they are.
This would help in using Adapt it for 'backtranslations' especially in the Balsa
programs collection, in which case the source and target languages often are
not very
related/close.
What could be done is to add one character (e.g. #) at the beginning (or end)
of a
word of the target (translation), which is not rendered in the text, but which
is
used for Adapt It to know, that this word keeps it's capitalization regardless
of the
the source text.
Original comment by wolfgang...@gmx.de
on 16 Mar 2009 at 3:30
Adapt It's performance is not really dependent on the size of the KB, so you
need not
worry about language patterns that enlarge its KB. True enough, it will require
a
little more work for the user to enter extra forms, and upper and lower case
forms of
some words, but Adapt It doesn't care. Some people who work in highly
agglutinative
language originally thought that Adapt It wouldn't work well for their
situation,
thinking that they would not save much effort in using Adapt It because they had
hundreds or thousands of possible inflectional forms to deal with. Most found
out,
however, that it didn't matter in the end. Although more typing work was
required
initially, Adapt It still worked well for their agglutinative languages
eventually.
Although some agglutinative language may have thousands of potential forms, it
generally turns out that only a relatively small subset of all possibilities
actually
appear in an entire New Testament. The KB might turn out to be two or three
times
larger than an average KB for an isolating language, but it won't turn out to
be tens
or hundreds of times larger.
One of the main differences between Adapt It and CARLA is that Adapt It does not
require (or permit) any formalization of syntactic or grammatical rules. Every
language and every good translation made from a given language end up being a
linear
"stream of speech" in the end. Adapt It functions entirely without any
linguistic
awareness or any understanding of the hierarchies inherent in the two language's
syntactic structure and semantics - it only recognizes and matches forms such as
occur in the ultimate linear "stream of speech". Hence, Adapt It can never know
whether it is adapting an adjective, a noun, or a verb, or a discourse
particle, so
it would not be possible to get Adapt It to act differently (or capitalize
differently) for different linguistic contexts. In other words, Adapt It cannot
"learn" to adjust the way it capitalizes based on awareness of meaning or
grammatical
context. The simplicity and effectiveness of Adapt It is due to the fact that it
simply matches patterns in the linear flow of speech between two languages that
are
saying the same thing. Adapt It depends entirely on the bilingual knowledge of
the
user for correct grammatical and syntactic information that are inherent in that
linear flow of speech, and it generally works very well at that level - at
least for
languages that have some degree of syntactic similarity. An ad hock method such
as
you describe to help Adapt It to "know, that this word keeps it's capitalization
regardless of the source text" might possibly work to a limited extent for your
English to German project, but it would likely not work for any other project,
and
would violate Adapt It' model of simplicity.
In addition to being the lead developer for Adapt It, I am a senior translation
consultant in the PNG Branch. As a translation consultant for nearly 20 years,
I have
seen many back translations, and I myself provided back translations for other
consultants for the entire Nyindrou New Testament (that was in the days before
Adapt
It existed). In the first few years of Adapt It's existence some consultants
balked
at the IDEA of using Adapt It to create back translations, mainly out of
"principle"
because they didn't think a computer program could do an adequate job of
creating one
for consultant checking purposes. However, in recent years many consultants do
recognize that Adapt It can be used to create decent and helpful back
translations,
especially since recent versions of Adapt It can also create free translations
to
supplement the somewhat literal back translations that it can create; and notes
can
be added at will at any point in the document to clarify what might not be
obvious in
the back translation. I can tell you that it is not really necessary to get
capitalization correct in the consultant's language to have a good back
translation
that the consultant can use effectively in checking the translation.
Original comment by adaptitbill@gmail.com
on 19 Mar 2009 at 3:04
Thanks Bill for all your explanations.
I have never used Adapt It in a field situations (when we finished our
translation in
1991, there was no AI around, and later, giving computer training in Africa,
this was
mainly Paratext.
So from your experience, upper/lower case distinction does not make a big
difference
and is not really an issue for most of the languages. (I think this is typical
German
thing)
Original comment by wolfgang...@gmx.de
on 19 Mar 2009 at 7:39
Original issue reported on code.google.com by
wolfgang...@gmx.de
on 12 Mar 2009 at 10:39