common-voice / common-voice

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
https://commonvoice.mozilla.org/
Mozilla Public License 2.0
3.31k stars 844 forks source link

Add Toki Pona #3435

Closed Daenyth closed 2 years ago

Daenyth commented 2 years ago

Language name English name: Toki Pona endonym: toki pona

Language code ISO 639-3 "tok"

Language size Somewhere between 500-5000 speakers (Details are in the iso application)

Plural forms

How would you translate the following in this language?

Words in toki pona aren't inflected for number.

Most commonly, numbers aren't referenced in detail - the core system is "one, two, many". There are other forms that use an additive system when precision is necessary but it's generally not needed. It's also possible, but not conventional, to use arabic numbers

0 rocks - kiwen ala / kiwen 0 1 rock - kiwen wan / kiwen 1 2 rocks - kiwen tu / kiwen 2 3 rocks - kiwen mute / kiwen tu wan / kiwen 3 4 rocks - kiwen mute / kiwen tu tu / kiwen 4 5 rocks - kiwen mute / kiwen luka / kiwen 5 10 rocks - kiwen mute / kiwen luka luka / kiwen 10 20 rocks - kiwen mute / kiwen mute / kiwen 20 100 rocks kiwen mute / kiwen ale / kiwen 100 1000 rocks kiwen mute / kiwen ale ale ale ale ale ale ale ale ale ale / kiwen 1000

I see 0 rocks on the ground

mi lukin e kiwen ala lon ma

I see 1 rock on the ground

mi lukin e kiwen lon ma

I see 10 rocks on the ground

mi lukin e kiwen mute lon ma

I see rocks on the ground

mi lukin e kiwen lon ma

Pontoon manager

https://pontoon.mozilla.org/contributors/do8QXzEZpP-l3d5kBEXQX6a_psg/

Language Script

What is the name of the language scripts used to write your language?

The most common written form is using latin characters - sitelen Lasina The second most common is the logography "sitelen pona" (not currently in unicode, but fonts are present for it)

Heyhillary commented 2 years ago

Hey @Daenyth,

Thanks so much for your request. Just before, I enable Toki Pona is it posisbly to clarify the CLDR plurals and the plural rule.

From my understanding of what you have shared CLDR Plurals are 1, 2, many but I'm unable to fina a source that lists the plural rule for Toki Pona. By any chance could you confirm this please ?

Many thanks in advance !

Daenyth commented 2 years ago

Words in toki pona don't have a plural form. It's generally left to context the same way that languages without definite/indefinite articles infer definite-ness from context

You can specify the count of something by putting the number word as an adjective after the noun, which is what I have in the form above. The primary grammar book ("Toki Pona: the Language of Good") has a section on this, but it's also talked about in this free course: https://devurandom.xyz/tokipona/11.html

Sobsz commented 2 years ago

(For clarification: the word mute in Toki Pona literally means "many" and is often used as a stand-in for numbers above 2, as part of the language's philosophy of removing unnecessary detail. For Common Voice and other software projects, it's probably best to be exact and use the translingual Arabic numerals, as demonstrated in the examples after the slashes.)

tbodt commented 2 years ago

Words don't have to change when only a number changes, so it's reasonable to have only one plural form by default.

I eat one fruit: mi moku e kili 1 I eat 141 fruits: mi moku e kili 141

Edit: Thinking about it, it would be useful on occasion to have separate translations for one/many, allowing the first to use different phrasing like mi moku e kili wan taso (I eat only one fruit). Is it possible to specify ad-hoc plural forms while translating that aren't the same as the CLDR rules? If so what role do the CLDR rules play?

Heyhillary commented 2 years ago

Words don't have to change when only a number changes, so it's reasonable to have only one plural form by default.

I eat one fruit: mi moku e kili 1 I eat 141 fruits: mi moku e kili 141

Edit: Thinking about it, it would be useful on occasion to have separate translations for one/many, allowing the first to use different phrasing like mi moku e kili wan taso (I eat only one fruit). Is it possible to specify ad-hoc plural forms while translating that aren't the same as the CLDR rules? If so what role do the CLDR rules play?

So the CLDR-Plurals are used for displaying selector expressions that can give several alternatives relevant to the given language. Please see in the screenshot an example of this via the Documentation for Localization: https://mozilla-l10n.github.io/localizer-documentation/tools/fluent/basic_syntax.html?highlight=cldr#selectors-and-plurals

Screenshot 2022-01-31 at 11 08 55
Heyhillary commented 2 years ago

Could I suggest that for the purposes of the project, that you create a style guide for localisation ? As once the CLDR Plurals and plural forms are added changing they can't be to adhoc as it would effect previous translations as the {$object} would change see above example. This style guide can be featured in Pontoon for your team page.

The style doesn't have to be reflected in the sentence corpus but it would be good to consider specifiying validation rules that apply to Toki Pona once your language is live on the Platform.

tbodt commented 2 years ago

Hm, based on the docs here it seems appropriate to have everything in an "other" plural category, since you can define custom plural forms during translation based on exact number matches. I can't think of any other category of more than one number, at least.

tbodt commented 2 years ago

@Heyhillary Quick question, when will @gregdan3 get assigned the manager permission? We can't approve any of the terminology translations since no one is a manager for tok.

gregdan3 commented 2 years ago

I've added a style guide to be merged into the L10n repo here! This has been looked over by several members of the community, but I expect to hear more before the end of the day. I don't expect to make any more significant changes, however.

gregdan3 commented 2 years ago

Follow-up: any timeline on when I or others will be able to approve translations in Pontoon? Currently, the team only has contributors, no managers. @Heyhillary

Additionally, I was looking through the Projects page, and noted it was not possible to request the Common Voice project be added to the tok team's project list. It is possible to request other projects, such as Firefox, Focus, AMO, and Mozilla VPN. Is there a reason for Common Voice not being available in tok yet?

And lastly, there are two errors on Pontoon's listing of tok: it still lists Toki Pona as having one, two, many plural forms- Toki Pona only has one plural form, as words do not decline for number in the language. However, number may still be indicated as an adjective. And, Pontoon also still lists toki pona as having zero literate speakers; as noted in the original post and in the ISO application, there are 500-5000 speakers.

Heyhillary commented 2 years ago

Hey @gregdan3,

Thanks for highlighting your questions, I am currently still setting up the page before enabling the projects.Hence the gaps information on Pontoon. To complete the set up I do this I asked earlier regarding clarification on the CLDR plurals, which are sepoerate from plural forms. By any chnace do you know what they are for Toki Pona.

Once I have this I can complete the set-up and have people as managers.

Edit: I've changed the plural forms, used the 1600 speakers estimate from the ISO application If you have any futher questions please feel free to ask. Thanks again for raising this !

Heyhillary commented 2 years ago

Hey @Daenyth,

Thanks so much for creating the style guide. Is it possible if you could confirm the CLDR plurals for Toki Pona please ? The earlier I have this info, I can enable Common Voice on the project.

Secondly I have now made you the manager and you will recieve permission access, to help delgate the different roles. I will send a onboarding links that can help you with this.

If you have any questions please let me know ?

gregdan3 commented 2 years ago

@Heyhillary

Sorry for the misunderstanding!

If I follow correctly, the CLDR plurals would be Other only.

There are multiple styles in which this might be represented- multiple ways Other could be written, which is documented in our style guide and in the initial post by @Daenyth. However, there is only one form:

Other: mi moku e kili {0}.

It is also possible to specify "mi moku e kili", which makes no claim about number.

Sobsz commented 2 years ago

One: mi moku e kili.

...not quite accurate, as that just says "I eat fruit (of an unspecified quantity)". I believe there should just be one category, "other", as in Korean, Vietnamese, Lojban, etc.

gregdan3 commented 2 years ago

@Sobsz Good correction. I've edited my post

Heyhillary commented 2 years ago

Hey everyone thanks so much for your resposnes and support.

I have now added Toki Pona into our localising tool Pontoon and given you the manager role to @Daenyth. To help get you started, here is some valuable information to help you as a manager for the language. The role of a manager means all your work will go to staging (and later to production) without peer review.

If you would like to learn how to use Pontoon please check out this video.

I hope through your work, there would be others joining you to collaborate on the different phases of this project. Here are a few docs that will help you with using Pontoon, working with other contributors, and localizing the file written in .ftl:

Thank you and welcome to the Mozilla l10n and Common Voice community!

If you have any questions please feel free to ask on Matrix

Kind regards,

Hillary