alan-if / alan-i18n

ALAN Internationalization Project
Other
0 stars 1 forks source link

Translate and fix several messages #48

Closed Rich15 closed 2 years ago

Rich15 commented 2 years ago

Translations and updates

meta.alan

Translate all English messages in the 'meta.alan' source adventure and add some more synonyms for the 'presionar' verb.

Also delete the accent mark in 'bidónes' since is incorrect: the correct form is 'bidones' (without accent mark). Update solutions files based on this.

mensajes.i

Fix an issue with the CONTAINMENT_LOOP2 Run-Time message, which wasn't handling the number of 'estar' verb properly. The code was using the number of the first parameter instead of the second one.

Improve translation of 'NO_UNDO' and 'UNKNOWN_WORD' messages because the old ones had an unclear meaning in Spanish and didn't convey the reason why the message appeared.

lanzar.i

Fix the GNA in 'lanzar' VERB. As @tajmone noted, "[lanzaro|lanzara|lanzaros|lanzaras]" are not right. Now it shows the proper "No puedes lanzarl$$ muy lejos" message.

The UNDO command

From #47:

I was wondering whether we should add Spanish SYNONYMS for UNDO, or whether this command is so universal in IF that it doesn't deserve a translation. @Rich15, what do you think?

UNDO is a pretty common command. However, since the translation of the 'NO_UNDO' message includes "revertir", it could be useful to add it and its imperative form (I.e: "revierte") as a synonym for players who prefer commands in Spanish.

tajmone commented 2 years ago

Thanks @Rich15, you did a marvellous job!

Just a question: I noticed that you removed the accent from "bidón". Was that incorrect? I've found the word with the accent in various online translators, but I was wondering if it's like in Italian, where certain accents are considered obsolete in everyday writing, and only used in dictionaries (or where two similar words might be confused if without accent).

As you can see, testing Run-Time MESSAGEs is quite tricky and requires a lot of hacks to elicit those messages which deal with edge-cases. We're almost there, but there are still some left (but most of them belong to same groups, so probably we're speaking of around six tests left).

The common MESSAGEs pertaining to descriptions, inventory, etc., are not worth testing since they are seen everywhere in the test files.

Rich15 commented 2 years ago

Just a question: I noticed that you removed the accent from "bidón". Was that incorrect?

The accent mark in "bidón" is correct! What's incorrect is putting it in its plural form "bidones" (no accent). That's what I changed in the source adventure and the solutions files :)

Predefined words problem

As you can see, testing Run-Time MESSAGEs is quite tricky and requires a lot of hacks to elicit those messages which deal with edge-cases. We're almost there, but there are still some left (but most of them belong to same groups, so probably we're speaking of around six tests left).

They are, indeed, quite tricky. I noticed there are some pending messages in meta-input-errors which rely on predefined words. Doing some tests I noticed "AND" cannot be translated to "y" because "y" is defined as another word class elsewhere. Do you know why this happens? Maybe is something related to the core of ALAN, but translating AND is important, since it will probably be used pretty often by players.

tajmone commented 2 years ago

What's incorrect is putting it in its plural form "bidones" (no accent).

Ah! I probably messed it up by manually creating the plural version by inserting the "ones" suffixes. So I guess it has to do with accents rules, i.e. that the addition of the plural suffix shifts the accent on the last syllable. Italian has very similar accent rules, but I'm very bad at distinguishing between the different accents (tonic vs grave) and the lack thereof, so much so that I was dumped out of a theatre course because I couldn't get their pronunciation right. :sad:

AND Word

Doing some tests I noticed "AND" cannot be translated to "y" because "y" is defined as another word class elsewhere.

Indeed, 'AND' is a special player word in ALAN (the AND_WORD) which ALAN needs to know about in order to be able to recognize concatenated commands (e.g. "take apple AND eat it") or multiple parameters (e.g. "take apple AND bread") in verbs that allow multi-parameters (via the * multiple indicator, e.g.: Syntax take = take (obj)*.).

For the full list of ALAN's Predefined Player Words, see:

https://github.com/tajmone/Alan3-Italian/wiki/Predefined-Player-Words

Their translation in the Spanish library can be found in the gramática.i module:

-- =========
-- ALL WORDS
-- =========

Synonyms todo, todos, toda, todas = all.

Doing some tests I noticed "AND" cannot be translated to "y" because "y" is defined as another word class elsewhere.

I'm not sure where you'd like to translate 'AND' as 'Y', but indeed you can't since it's already defined as a Synonym in the grammar module.

Do you know why this happens? Maybe is something related to the core of ALAN, but translating AND is important, since it will probably be used pretty often by players.

Yes it has to do with ALAN's core. I don't fully understand the details, because they can be entangled, but it has to do with how ALAN manages its internal dictionaries of known words, which it need to parse the player input. Depending on the type of word, ALAN might now allow the presence of multiple uses of a special word in different contexts, whereas in some contexts it's OK; but definitely you can't define multiple conflicting Synomyms or have a word that is both a Synonym and an input word elsewhere.

When I refer to "different contexts" I mean directions names, syntax words, instance identifiers (and their alternative NAMEs and adjectives) and the predefined player words (and their Synonyms), and all Synonyms.

Syntaxes and NAMEs are more tolerant, for different syntaxes might share same words (e.g. prepositions), and different objects might have same names and adjectives, and as long as you don't have two different SYNTAXEs matching an identical input, ALAN can handle them and resort to disambiguation for objects matching a same input.

But in contexts like directions, the rules are stricter and you can't use a direction word anywhere else — which is a problem in Italian, where many directions shorthands match prepositions or other common words, e.g. you can either 'NO' as a direction (NORDEST, northeast) or a response (as in YES/NO), but not both; and EST (east) can not be abbreviated to 'E' because 'E' in Italian is the AND WORD.

This limitation is being discussed on the ALAN source repository, and Thomas is looking into it — but to lift them would require re-writing the code for the player input parser and the words dictionaries, therefore it's a rather delicate operation because it could cause backward incompatibility, and needs careful planning and testing.

In the English language this never causes problems (except for NO vs NORTHEAST, but you can use the literal string "NO" as a parameter for player answers). In some languages these limitations tend to show up more, especially in Italian where many short words can be both a common word and an preposition, article or particle (because we have so many of them). You can find a list of examples on my Wiki:

https://github.com/tajmone/Alan3-Italian/wiki/i18n-Problems

I'm planning to add a dedicated Appendix to The ALAN Manual explaining where these same-words conflicts can occur and where they are allowed, along with some tricks to work around them — often you can circumvent the limitation, as you can read in:

https://github.com/tajmone/Alan3-Italian/wiki/i18n-Problems#contracted-prepositions

where in the Italian library you can't define the DAI preposition (from, m.p.) as a SYNONYM of DA (m.s.) because of the VERB "dai" (give):

SYNTAX give = 'dai' (obj) 'a' (recipient)

but you can cover this by an alternative SYNTAX in all VERBs that rely on the DA preposition:

Syntax get_down = scendi DA (obj)
                  scendi DAI (obj). -- this is OK!

... which is why I mentioned that the various contexts can get entangled and not always easy to understand.

Fixing "un" => "uno"?

I think we also need fix the following definition:

Synonyms
  una, unas, uno, unos = un.

to:

Synonyms
  una, unas, unos = uno.

which I noticed is always causing a compiler warning because un is never used. I believe that I made a mistake there, pasting from the Italian definition, where "un" is the default m.s. indef.article, but I believe in Spanish it's just "uno", is that right?

tajmone commented 2 years ago

Fixing "un" => "uno"? (no need)

I was wrong, the "un" article is correct.

I was lead to believe there's a problem because the compiler always warns that the "un" noun is not used anywhere.

Are Indefinite Articles NOISE WORDs?

The think here is that the Spanish and Italian grammar module are simply defining the various indefinite articles as SYNONYMS of the m.s. indefinite article — i.e. so that they are all represented by a single form in player input, in case we need to use them in some SYNTAXes.

I've never been sure whether indefinite articles should be just turned into NOISE WORDS (as SYNONYMS of "THE", like for definite articles) or whether they should be kept as independent words, to be usable with SYNTAXes.

If we turn them into NOISE WORDS, they would then just be ignore by the parser, so if the player types 'TAKE AN APPLE' it's parsed as 'TAKE APPLE' (just like with definite articles).

By default, ALAN doesn't define indefinite articles as NOISE WORDS (only the definite article "THE" is, not "A" and "AN"), so I just followed the example here. But there are some subtle linguistic differences between English and Italian/Spanish which might be worth considering...

In English eat A fruit and eat ONE fruit are two different sentences (the former is more like eat ANY fruit, the latter stresses the number of fruits to eat), unlike in Italian where both these sentences would use the indefinite article "UNA". Although an apparently subtle difference, when it comes to IF the difference is significant, especially in verbs that would accept spelled out numbers as parameters (e.g. eat [one|two|three] apples).

In real practice, I haven't actually come across practical examples where this would matter but potentially it could, so it's worth keeping this in mind, just in case. For now, it seems to me that it's better to keep the indefinite articles defined as they are, so that they might be usable in SYNTAXes like:

Syntax kissing = kiss (act)
                 give (act) A kiss

In Italian there are probably more verbs requiring this formulation with the indefinite article, compared to English. In any case, if we turn indefinite articles into NOISE WORDS, we couldn't preserve the above SYNTAX except by ignoring the article (which is ugly).

Rich15 commented 2 years ago

Ah! I probably messed it up by manually creating the plural version by inserting the "ones" suffixes. So I guess it has to do with accents rules, i.e. that the addition of the plural suffix shifts the accent on the last syllable. Italian has very similar accent rules, but I'm very bad at distinguishing between the different accents (tonic vs grave) and the lack thereof, so much so that I was dumped out of a theatre course because I couldn't get their pronunciation right. :sad:

I thought so! And don't worry, Spanish accents are difficult to me too haha.

The AND_WORD situation

Indeed, 'AND' is a special player word in ALAN (the AND_WORD) which ALAN needs to know about in order to be able to recognize concatenated commands (e.g. "take apple AND eat it") or multiple parameters (e.g. "take apple AND bread") in verbs that allow multi-parameters

I see.

I'm not sure where you'd like to translate 'AND' as 'Y', but indeed you can't since it's already defined as a Synonym in the grammar module.

I wanted to translate 'AND' to concatenate parameters in Spanish. When I try a command like:

> toma agua y refresco

The parser doesn't recognize "y" (it specifically throws the 'WHAT' message: "No entiendo bien esa frase. Redactala de nuevo"). However if I use:

> toma agua and refresco

It takes both objects.

Using 'THEN' also works:

> toma agua luego toma refresco

But it would be helpful if "y" could be added as a Synonym (translation) for 'AND'. I see you could do it in the Italian Library, so I don't know if this is a specific problem in the Spanish one or I'm missing something.

but definitely you can't define multiple conflicting Synomyms or have a word that is both a Synonym and an input word elsewhere.

I noticed this when I tried to define "ve" as a Synonym for 'go', but it already was a Synonym for 'look' (because "ve" is the imperative form for both verbs in Spanish).

I'm planning to add a dedicated Appendix to The ALAN Manual explaining where these same-words conflicts can occur and where they are allowed, along with some tricks to work around them

That would be very useful!

Indefinite Articles

In English eat A fruit and eat ONE fruit are two different sentences (the former is more like eat ANY fruit, the latter stresses the number of fruits to eat), unlike in Italian where both these sentences would use the indefinite article "UNA".

It is the same in Spanish. "come una fruta" could be any fruit or a single fruit.

In real practice, I haven't actually come across practical examples where this would matter but potentially it could, so it's worth keeping this in mind, just in case. For now, it seems to me that it's better to keep the indefinite articles defined as they are, so that they might be usable in SYNTAXes like:

Syntax kissing = kiss (act)
                give (act) A kiss

I think the same. It's better to let them like they are for now. Maybe, if later we realise they are not really used, we could define them as NOISE WORDS.

tajmone commented 2 years ago

AND_WORDs Update ...

Sorry, you were right ... the SYNONYM definition for 'AND' was missing in the grammar module — I even cited the code and linked to it, but it was the ALL WORDS definition, not the AND WORDS (I was over-tired and confused the two).

So, thanks for pointing this out! Bear in mind that I often overlook things when I'm working at night, due to tiredness :tired_face: and too much coffee :coffee:.

In any case, I'll be adding a dedicated solution file targeting the Predefined Player Words, as part of the meta.alan tests, so we'll be sure that all the required SYNONYMS are in place and working as expected (again, if it wasn't for the test suite, so many errors could slip by unnoticed in the library, as indeed they have in the past).

The THEN AND_WORD

Using 'THEN' also works:

toma agua luego toma refresco

I've tried it and it didn't work (there was no definition of 'LUEGO' as a SYNONYM of 'AND' either). But I've now added that too.

AND_WORDs Fixed! (this time for REAL)

Anyhow, I've now fixed the situation and added both 'Y' and 'LUEGO' as a SYNONYMs of 'AND' in gramática.i:

-- =========
-- AND WORDS
-- =========

Synonyms y, luego  = 'and'.

Any Other AND_WORDs Needed?

The AND WORDS are used by the parser to understand when the player is concatenating multiple parameters or multiple commands in the input line. So if you think there are other AND WORDS that should be added to the list let me know, so we include them in the definition before the dev branch is merged into the next release.

Alan treats multiple AND WORDS in a row as a single word, because some languages allows sentences like:

> take apple AND THEN eat it

E.g. in Italian it's common to use "E POI" (and then), so the ALAN interpreter just treats multiple consecutive occurrences of AND WORDS as a single one (all ALAN needs to know is that there's a special input word for concatenation, it doesn't care about the grammar behind it).

Just beware that once you've defined a term as an AND WORD, it can't be used any longer in SYNTAX definitions — e.g. this would fail to compile:

Syntax ejecutar_con2 = ejecutar (código) con (obj1) y (obj2)

(in fact, I had to tweak the above definition in meta.alan, substituting y with sobre.)

So before including a term in the definition we need to be sure that it doesn't have other potential uses in player input — i.e. that it might not be also usable in verb syntaxes (between parameters), a direction name, or a member of another group of predefined player words.

I've added LUEGO as you suggested, since it seems to make sense to me (similar to Italian "POI").

SYNONYMs Conflicts Guidelines

I'm sharing some thoughts and guidelines regarding the potential conflicts that might arise when creating SYNONYMS for special ALAN player words, and how to decide if less commonly used terms are worth being included, etc.

Regarding the use of LUEGO in the Spanish library, consider that in Italian we also have the word DOPODICHÉ (Spanish: DESPUÉS), but I didn't include that as a SYNONYM in the Italian library, mostly because the average player wouldn't use it in place of POI (then), since the latter is much shorter, but also because there are subtle differences in usage: you can say:

> mangia la mela E POI la banana

(eat the apple and then the banana) but you can't say:

> mangia la mela DOPODICHÉ la banana

because 'dopodiché' has stronger implications in terms of separating two events in time, and can't really be used to extend a previous action into another iteration, because it's used to indicate separate distinctive actions occurring in sequence (it's a contraction of 'DOPO DI CHE' which literally means: then, after that, ...).

But ultimately I might include it in the library (haven't decided yet), because when dealing with input parsing we should not be too worried about similar considerations since we expect the player to type well formed commands. So, even if a player uses 'DOPODICHÉ' in a place where 'POI' would have been more correct, it's still fine for us — as long as these SYNONYMS don't lead to ambiguous parsing results, we're OK with the fact that, for example, the player can use ANY definite article in the input, regardless of the nouns' GNA, because we don't really care about definite articles in input — so much so than multiple consecutive articles are all ignored, so it's OK to type:

> tomas EL LA LOS LAS EL agua

because the parser ignores the articles. We simply don't care about this because we don't expect players to type garbage at the prompt — and if they do, fine, as long as it parses correctly.

But we must always be aware of terms which might have multiple meaning and uses in the language. E.g. the Italian preposition 'DAI' (from, m.p.) is problematic and can't be defined as a SYNONYM because the word has also other common uses: DAI is also the imperative form of give, which players actually need to use in real games, to give objects to NPCs.

If a term has a meaning which is less likely to be used, we can make an exception. E.g. the Italian preposition 'DEI' (of, m.p., as in the books of the teachers) conflicts with the word DÈI, which is the plural of DIO (god), but the latter has an accent on the 'È', so the conflict only occurs if you provide a "lazy alternative name" for the word. In any case, the chances of an IF adventure having a game object called "gods" are statistically low, unlike the imperative give, so one could make an exception and create the SYNONYM.

tajmone commented 2 years ago

Player Words: New Wiki Page

@Rich15, I've created a new Wiki page dedicated to predefined player words, providing some guidelines for library translators:

https://github.com/alan-if/alan-i18n/wiki/Predefined-Player-Words

The article is not complete yet. It's based on a similar article I wrote for the ALAN Italian Wiki:

https://github.com/tajmone/Alan3-Italian/wiki/Predefined-Player-Words

but I've adapted it to the needs of this project (and updated some old references too).

I've "recycled" some elements that came up in this thread, especially regarding the potential conflicts with player words and other uses of the same term in syntaxes, etc. But I still need to expand more on this aspect, providing examples of such conflicts. Since the article is written for translators of the library, we should add real case examples from the Italian and Spanish library, to illustrate conflict cases.

Rich15 commented 2 years ago

AND_WORDs

Anyhow, I've now fixed the situation and added both 'Y' and 'LUEGO' as a SYNONYMs of 'AND' in gramática.i:

-- =========
-- AND WORDS
-- =========

Synonyms y, luego  = 'and'.

Great!

I've tried it and it didn't work (there was no definition of 'LUEGO' as a SYNONYM of 'AND' either). But I've now added that too.

Yes, sorry for that. I didn't realise I was doing the test on my development branch (where I included "luego" as a synonym for the 'AND_WORD'). Oops.

Regarding the use of LUEGO in the Spanish library, consider that in Italian we also have the word DOPODICHÉ (Spanish: DESPUÉS), but I didn't include that as a SYNONYM in the Italian library, mostly because the average player wouldn't use it in place of POI (then), since the latter is much shorter, but also because there are subtle differences in usage: you can say:

> mangia la mela E POI la banana

(eat the apple and then the banana) but you can't say:

> mangia la mela DOPODICHÉ la banana

because 'dopodiché' has stronger implications in terms of separating two events in time, and can't really be used to extend a previous action into another iteration, because it's used to indicate separate distinctive actions occurring in sequence (it's a contraction of 'DOPO DI CHE' which literally means: then, after that, ...).

It's pretty similar in Spanish. Actually, I was going to include "después" as a synonym for 'THEN', but then I thought most players would use the shorter "luego" instead. But, as you said, it might be better to include it just in case.

The AND WORDS are used by the parser to understand when the player is concatenating multiple parameters or multiple commands in the input line. So if you think there are other AND WORDS that should be added to the list let me know, so we include them in the definition before the dev branch is merged into the next release.

Besides the mentioned "después", I think those two are all the AND_WORDs needed

BUT_WORDs

A good synonym for the 'BUT_WORD' could be "excepto".

THEM and IT

As we discused in #35 once, in Spanish is pretty more difficult to implement 'IT' and 'THEM' words, because you would need to handle suffixes and other things. From this answer:

To do something like this in Spanish would require to change the verb. Using the example above:

> x vampiro 
Un antiguo vampiro de Transilvania.

> matarlo

The suffix "-lo" depends on GNA:

  • m.s: "-lo" (matarlo=kill HIM)
  • f.s: "-la" (matarla=kill HER)
  • m.p: "-los" (matarlos=kill THEM)
  • f.p: "-las" (matarlas=kill THEM)

There is also the imperative form (mátalo/a, mátalos/as), and we'd need to do it for every verb in the library.

So I don't know if instead of a PRONOUN we should define something like act_suffix or something like that to handle these cases. And if we need to do it for both imperative and infinitive or just one of them. This could be difficult, so even though it could save some time for players, if it doesn't work very well or results being too complex, we could let them write the character/s they want to interact with.

Then we concluded it would be too difficult, and it was better to inform players to use 'IT' and 'THEM'. But I don't know if now we could implement it somehow.


@Rich15, I've created a new Wiki page dedicated to predefined player words, providing some guidelines for library translators:

https://github.com/alan-if/alan-i18n/wiki/Predefined-Player-Words

Thank you so much! This is very helpful.