adapt-it / adaptit

Related language translation editor
Other
10 stars 5 forks source link

Both upper and lower case entered in KB #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Having switched on: Tools/use Automatic Capitalisation and having defined
upper/lower case equivalences both for source and target language (in
Edit/Preferences/Case), the following happens:
- for the English word 'father' I enter the German 'Vater'. But the next
time father appears in the text, both 'Vater' and 'vater' show up as
possible translations.
Even removing 'vater' does not help, the next time, both are still there.
I trief to remove it both in the 'Choose Translation' dialog and also in
the KB, it simply does not work

As far as I remember, this does not happen with all nouns, which are
(usually) lower case in English, but upper case in German.

What is the expected output? What do you see instead?

Any idea, what the reason for this could be?

What version of the product are you using? On what operating system?
AI 4.1.1./Linux

Please provide any additional information below.

Original issue reported on code.google.com by wolfgang...@gmx.de on 12 Mar 2009 at 10:39

GoogleCodeExporter commented 9 years ago
When Automatic Capitalization is turned on, Adapt It's design is for handling
*regular* case "correspondences" between the source and target language. As 
explained
in Issue #20, since Adapt It is designed to adapt between *similar* languages, 
Adapt
It only tries to duplicate the case of the source text into the target text. 
This
works for many languages (especially those that have new orthographies), but 
Adapt
It's Auto Capitalization feature will not work consistently between two 
languages
such as German and English which have long established different capitalization
rules, and those rules do not afford a one-to-one relationship between the two
languages. When Auto Capitalization is ON, Adapt It attempts to maintain its KB 
with
lower case forms, and changes the target forms to upper case when the source 
language
uses upper case. This is why you are seeing both 'vater' and 'Vater' in the KB 
when
the Automatic Capitalization is turned on. When translating from English to 
German,
you should turn off Automatic Capitalization and simply allow Adapt It to 
associate
source and target words even if they have different capitalization in different
contexts. No harm is done. It doesn't matter to Adapt It that its KB gets 
populated
with some words that have pairs - one with upper case and one with lower case 
spellings. 

Original comment by adaptitbill@gmail.com on 12 Mar 2009 at 11:39

GoogleCodeExporter commented 9 years ago
What makes it cumbersome to switch automatic capitalization off is, that 
'Adapit It
then distinguishes between two phrase forms (e.g. English -> German)
- a   and
- A
Both have in German about 5 meanings (ein, einer, eine .....)
which have to be entered in both for 'a' and for 'A'.
And this would enlarge the KB quite a bit as many of the English words can show 
up in
Upper case after a punctuation mark.

If capitalization rules similar as
- English Upper case adjectives are rendered as lower case in German and
- German Upper case nouns are rendered as lower case in English
are more common between languages (I do not have an idea of this):

-> It would be nice to be able to tell Adapt It that some words of the target
language do not follow automatic capitalization (except after some punctuation 
marks)
but stay in the form they are.

This would help in using Adapt it for 'backtranslations' especially in the Balsa
programs collection, in which case the source and target languages often are 
not very
related/close.

What could be done is to add one character (e.g. #) at the beginning (or end) 
of a
word of the target (translation), which is not rendered in the text, but which 
is
used for Adapt It to know, that this word keeps it's capitalization regardless 
of the
the source text.

Original comment by wolfgang...@gmx.de on 16 Mar 2009 at 3:30

GoogleCodeExporter commented 9 years ago
Adapt It's performance is not really dependent on the size of the KB, so you 
need not
worry about language patterns that enlarge its KB. True enough, it will require 
a
little more work for the user to enter extra forms, and upper and lower case 
forms of
some words, but Adapt It doesn't care. Some people who work in highly 
agglutinative
language originally thought that Adapt It wouldn't work well for their 
situation,
thinking that they would not save much effort in using Adapt It because they had
hundreds or thousands of possible inflectional forms to deal with. Most found 
out,
however, that it didn't matter in the end. Although more typing work was 
required
initially, Adapt It still worked well for their agglutinative languages 
eventually.
Although some agglutinative language may have thousands of potential forms, it
generally turns out that only a relatively small subset of all possibilities 
actually
appear in an entire New Testament. The KB might turn out to be two or three 
times
larger than an average KB for an isolating language, but it won't turn out to 
be tens
or hundreds of times larger.

One of the main differences between Adapt It and CARLA is that Adapt It does not
require (or permit) any formalization of syntactic or grammatical rules. Every
language and every good translation made from a given language end up being a 
linear
"stream of speech" in the end. Adapt It functions entirely without any 
linguistic
awareness or any understanding of the hierarchies inherent in the two language's
syntactic structure and semantics - it only recognizes and matches forms such as
occur in the ultimate linear "stream of speech". Hence, Adapt It can never know
whether it is adapting an adjective, a noun, or a verb, or a discourse 
particle, so
it would not be possible to get Adapt It to act differently (or capitalize
differently) for different linguistic contexts. In other words, Adapt It cannot
"learn" to adjust the way it capitalizes based on awareness of meaning or 
grammatical
context. The simplicity and effectiveness of Adapt It is due to the fact that it
simply matches patterns in the linear flow of speech between two languages that 
are
saying the same thing. Adapt It depends entirely on the bilingual knowledge of 
the
user for correct grammatical and syntactic information that are inherent in that
linear flow of speech, and it generally works very well at that level - at 
least for
languages that have some degree of syntactic similarity. An ad hock method such 
as
you describe to help Adapt It to "know, that this word keeps it's capitalization
regardless of the source text" might possibly work to a limited extent for your
English to German project, but it would likely not work for any other project, 
and
would violate Adapt It' model of simplicity.

In addition to being the lead developer for Adapt It, I am a senior translation
consultant in the PNG Branch. As a translation consultant for nearly 20 years, 
I have
seen many back translations, and I myself provided back translations for other
consultants for the entire Nyindrou New Testament (that was in the days before 
Adapt
It existed). In the first few years of Adapt It's existence some consultants 
balked
at the IDEA of using Adapt It to create back translations, mainly out of 
"principle"
because they didn't think a computer program could do an adequate job of 
creating one
for consultant checking purposes. However, in recent years many consultants do
recognize that Adapt It can be used to create decent and helpful back 
translations,
especially since recent versions of Adapt It can also create free translations 
to
supplement the somewhat literal back translations that it can create; and notes 
can
be added at will at any point in the document to clarify what might not be 
obvious in
the back translation. I can tell you that it is not really necessary to get
capitalization correct in the consultant's language to have a good back 
translation
that the consultant can use effectively in checking the translation.

Original comment by adaptitbill@gmail.com on 19 Mar 2009 at 3:04

GoogleCodeExporter commented 9 years ago
Thanks Bill for all your explanations.
I have never used Adapt It in a field situations (when we finished our 
translation in
1991, there was no AI around, and later, giving computer training in Africa, 
this was
mainly Paratext.

So from your experience, upper/lower case distinction does not make a big 
difference
and is not really an issue for most of the languages. (I think this is typical 
German
thing) 

Original comment by wolfgang...@gmx.de on 19 Mar 2009 at 7:39