Closed Reinmar closed 7 years ago
Do you plan to sue whole API from gettext? Like plural handling and stuff?
Nope, I don't think so. Maybe I'm wrong, cause finally I haven't checked that precisely, but I think that it's too much work and we have very little use cases for those things, so this would be a low priority.
I mean – knowing CKE4 – there were just a couple of places where we had to format a plural, which you can always somehow workaround, even if not very gracefully. So this wasn't high on the priority list.
OTOH, perhaps we could dive into this, and if it's possible, predict a full gettext support. I wouldn't like to work on it now, but if it would be possible to implement it in the future, then we'd be safe. Or it may turn out that it's simpler than I think... I'll better check it :D.
From what I understand from https://www.gnu.org/software/gettext/manual/gettext.html#Translating-plural-forms we need to pass a count
number to the t()
function, so it knows which translation to use.
ngettext [option] [textdomain] msgid msgid-plural count
So it'd be like:
t( 'file-upload/uploaded-message: Uploaded a file. | Uploaded %0 files.', fileCount, fileCount );
Why fileCount
is twice? Because the first one is the count
, the second is the %0
. It's a bit ugly, I know. The first param gets very complicated as well... perhaps these would need to be 3 separate strings then or an object (better, because t()
won't change its number of params in the built code). Actually, in this case the count could be moved to that object too:
t( {
context: 'file-upload',
id: 'uploaded-message',
singular: 'Upladed a file.',
plural: 'Uploaded %0 files',
count: fileCount
}, fileCount );
Now, thet()
function needs to be able to understand which of the plural forms it needs to use. So we'd need to include in the bundles the function which represents a plural formulas, the translate()
function will know it and return the proper string based on the passed count
. Then, the normal %0
substitution will take place.
If I don't miss anything, I think that we should be able to add support for plural forms in the future. It will require more a complicated logic for t()
extractions and getting the plural formula functions somehow...
BTW, I haven't mentioned one thing here – the use of external libraries. Obviously, there are libraries which parse and write PO files and we'll use one. However, for the string resolution logic (so what the translate()
function should do) I don't think that we'd be able to use any library. It needs to be perfectly integrated with our code and bundling process, so I don't see much space for any 3rd party helpers here.
Looks great. It would be nice if the tool could list some errors like:
contexts.json
file,contexts.json
more than once),@Reinmar Maybe it's more an edge-case, but words in many languages can have more than one plural form. All in all you have written a very clean view of what we need to implement.
@ma2ciek gettext has it already defined and its standard is used in many places.
Checkout this: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html
It handles polish irregular plural forms.
entries without context,
Yep.
translations of the same context which overwrite each other,
Yep.
using context which is not described in any contexts.json file,
Yep.
context definitions that overwrite each other (the same context defined across multiple contexts.json more than once),
Yep.
finding a pipe in the text and no vars defined, which would mean that the pipe was used as a part of the text (some people will create language files based on existing files, they will not read the full spec to understand the special meaning of a pipe)
You mean, in the plural usage form? I don't want to use "|" there, that's why I proposed the object format, as more explicit and less error prone. OTOH, the pipe format could be an option and totally satisfactory in most of the case. But we won't work on this now anyway, so we can discuss this in the future.
@Reinmar Maybe it's more an edge-case, but words in many languages can have more than one plural form. All in all you have written a very clean view of what we need to implement.
The idea is that en has 2 forms, so our t()
usage needs to predict only that + the count
param. The translate()
function must support multiple forms, because only it will have to worry about non-eng languages.
I don't want to use "|" there, that's why I proposed the object format
👍
I found a quick tutorial how gettext workflow looks like and it's a bit different than how I imagined it: http://www.labri.fr/perso/fleury/posts/programming/a-quick-gettext-tutorial.html
You write code with e.g. _("Hello World\n")
.
You run xgettext
on this code and create a .pot
file (PO template) with such a content:
msgid "Hello World\n"
msgstr ""
You run msginit
to create a .po
file for a concrete language.
Now you add translations of this text in that .po
file.
Finally, you run msgfmt
to create a binary for that .po
file, which equals our build step.
It's also important that when you run xgettext
again, you run it with the -j
(--join-existing
) option which... I don't precisely know what it's doing :D. Especially that then you run msgmerge
to merge these changes with the concrete .po
files.
It would be worth checking how this merging process look and how sending this to Transifex looks like too. E.g. can we just generate en.po
files from the code and send them to Transifex skipping the whole update and .pot
s generation process?
For me the process could look like this, but perhaps I miss something:
en.po
s out of the code..po
files for all other languages back.AFAIR the .pot
is the file which is used as the source of currently available msgs.
In time some messages are added and some are removed. Current state of app's messages are in current .pot
file.
The msgmerge
will update translation files (.po
files) so obsolete messages are removed and new strings are added.
This workflow is nicely supported by some tools (namely poedit ) in which you can run .pot
file against .po
file to update translations and create updated .po
file.
So in such workflow the .pot
file is needed for updating the .po
files in future.
I think that the en.po
would be the same as ckeditor5.pot
. But I don't know if creating en.po
makes sense (dunno how transifex works in that matter).
Also, I found https://lingohub.com/blog/2013/07/php-internationalization-with-gettext-tutorial/#What_form_of_msgids_should_be_used which clarifies how the msgids should be created:
As previously mentioned, msgid is used by gettext to identify the msgstr that should be displayed in its place. If multiple equal msgids are present, msgctxt is used to tell them apart. There are several more important roles that msgid plays:
- msgid is what the programmer sees in the code
- msgid is what is being displayed by default if the proper msgstr in the current locale can not be found (as a fall back)
- in traditional gettext usage, the translator translates from msgid to the target locale, instead of from source msgstr to target msgstr; so, msgid is what he sees and what gives him information what should be translated
Therefore it is important to chose the form of the msgid carefully. The value of the msgid can be any string. The gettext manual suggests that the original string in the source (or English) language should be used. That might pose certain problems, for example if msgstr of the original string is changed and if the programmer decides to synchronize the msgid with a new msgstr, he would have to do it in all places in the code where it is used as well as in all other PO files. Also, there is no limit to msgid or msgstr length. Some lengthy msgids would make the code less readable.
Some developers propose that the msgids should be constructed in a more structured way [5] [6], where msgid describes its role in the application or role in the code (in which view, partial, container are they located) instead of its content. It is a great way to organize the strings from the developer’s perspective, but it might pose problems to the translator, since the translator no longer has the a source string in the msgid to translate from.
This reminded me, what I forgot to describe and what have been confusing for you – where in the t()
calls do we have msgctxt
, msgid
and msgstr
. So, to clarify this. For the following t()
call and contexts.json
:
t( 'basic-styles/bold: Bold' );
basic-styles/lang/contexts.json
(note, I was previously using a deeper structure for the contexts.json
file, but it doesn't make sense – both levels (package name and the id can be together)):
{
"basic-styles/bold": "Bold button label."
}
The en.po
file would look like this:
msgid "basic-styles/bold"
msgstr "Bold"
msgctxt "Bold button label."
So:
msgid
we use the id of that string, not the "Bold" string. Why? The reasons were explained in the post I quoted – the msgid
has limited length and, more importantly, you lose all the translations if you change the id. So, if we'd use the real string "Bold" as msgid
, if we decided that for some reason it should be "Bold text", then translations would need to repeat their work and for some time we wouldn't have this string translated at all. This was pointed out by @wwalc and it happened (the English text was changed) in the past in CKE4 multiple times.So, if the msgid
of the string is not the string itself, then why defining the contexts at all? We could have e.g. this:
t( 'core/button: Button' );
// in other context:
t( 'tailor/button: Button' );
Theoretically, the msgid
clearly differentiates the use. However, as @wwalc pointed out again, translators on Transifex don't know CKEditor that well, so they might not be able to translate strings well enough based on short msgid
s. Hence, we need textual, human readable contexts just like we had in CKEditor 4 (e.g. https://github.com/ckeditor/ckeditor-dev/blob/master/dev/langtool/meta/ckeditor.plugin-basicstyles/meta.txt):
msgid
in every t()
usage, we'll be sure that the developer thought about what context it is, we'll be able to reuse translations from the core package, we'll know whether a certain text was defined in some other package already (we can have a tool to print msgstr
s which appear with more than one msgid
), etc. It's safe and pretty concise and useful at the same time.Thanks @jodator. TBH, I have still troubles in understanding the real purpose of pot
. Perhaps there are situations in which it allows resolving some unclear combinations of which string is defined in which po
file. Or maybe it's also related to the topic I've just described – what is msgid
? If you don't use real en strings for msgid
s, then the .pot
file contains a real template of all other po
files. Even to create the (base) en.po
file, you need to add the msgstr
s for all the entries, because, AFAICS, the _()
function is used only with msgid
. So in this case en.pot !== en.po
.
But, if we accept what I described above, then en.pot === en.po
, so we can generate just the en.po
file and, if anyone wants it, the en.po
can be used as en.pot
.
Anyway, @ma2ciek will need to dig into Transifex docs to learn about the proper process.
Just wanted to chip in to say that, with the new and awesome ES6 string templates, there is a lot of new functionality that can be applied here. For example, I just made a demo in 5 min (jsfiddle):
let translations = {
'I am $1 $2!': {
en: 'I am $1 $2!',
es: '¡Soy $1 $2!',
jp: (name, lastname) => `私は${lastname.toUpperCase()} ${name.toUpperCase()}です`
}
};
let name = 'Francisco';
let lastname = 'Presencia';
let translation = t`I am ${name} ${lastname}!`;
Even in this simple example this looks more natural than:
let translation = t('I am $1 $2', name, lastname);
Of course you could modify many things and improve others. The point is, with template strings the syntax is closer to natural English while giving you more flexibility than normal functions.
Hi all. I implemented internationalization service for targetprocess. And I talk you how we made it. Maybe it will be useful for you. We use Transifex as well.
This is my talk at one conference https://mavrin.github.io/i18n-pres (Russian).
I try explain the main idea.
We don't use syntactic keys. I mean something like that __("basic-styles/bold")
. We use original English phrase __("Bold button label.")
. What it gives us? We see in development mode good string and we don't need dictionary for 'en' lang. Function __
have implementation like this
const currentLang ='en'; //this variable we get from user settings
const defaultLang = 'en';
const dictionary = { // it load dynamically from separate bundle
'ru': {
'Bold button label.': 'Толстая кнопка'
}
}
const translate =() => {/*function which apply plural format or come variable*/} // we use https://github.com/yahoo/intl-messageformat for it
cost __ = (string, params) => {
if(currentLang === defaultLang) {
return translate(string, params);
} else {
return translate(dictionary[currentLang][string] || string, params);
}
}
How we made lang files for others. We put all dictionary to github repo.
update
route. This service allows you to see the full version of the dictionary.Of course you could modify many things and improve others. The point is, with template strings the syntax is closer to natural English while giving you more flexibility than normal functions.
Hi! Thanks for chipping in :) The use of templates was already proposed: https://github.com/ckeditor/ckeditor5-design/issues/136#issuecomment-191770415 and we came to a conclusion that it's not going to work in our case.
We need to synchronise translations through a service like Transifex which means that they need to be plain text strings. Also, with the t()
function we're able to pass an object there for a more complicated cases (see https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271249275).
EDIT:
@franciscop, I took a bit closer look at your demo and it's really interesting. I see that in the translation repository there are plain text strings and that you can change the order of placeholders in a translation and everything works fine:
en: 'I am $1 $2!',
es: '¡Soy $2 $1!',
If we were able to ensure that in the future we'd be able to implement the plural forms (see https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271249275), then using template strings could be an option. It doesn't sound particularly tricky to have the t()
function supporting templates and objects at the same time, so it seems feasible.
I edited my previous comment because it didn't make a lot of sense initially and I created https://github.com/ckeditor/ckeditor5-utils/issues/117 – it'll be worth considering adding support for template strings to the t()
function.
So I'm working now on the very basic support without plural forms and template strings which we'll add later. I'll be pushing changes to the pull request for the https://github.com/ckeditor/ckeditor5-utils/issues/116
After small searching for the phrase t( ... )
, which we want to find, collect and export to the .po file, I have found linkformview.js which is using generic form of t()
that crashes a little bit this idea. Are these generic forms more?
I have found linkformview.js which is using generic form of t() that crashes a little bit this idea. Are these generic forms more?
That's a bug. t()
should always be used with a string. You should throw an error if it's not.
I talked with @fredck and we decided to simplify the t()
calls by removing the msg id from them. It has some downsides (most of the things listed in https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271349641), but it has one important upside – the t()
calls will be shorter. Since the downsides will occur in rare situations, it's better to focus on something which most of us see – the t()
calls.
I updated the spec in the issue description and for you to be able to understand what I changed I created https://gist.github.com/Reinmar/2ca4400d5fb724fd9be88fd903aee8eb/revisions.
@Mavrin
Hi all. I implemented internationalization service for targetprocess. And I talk you how we made it. Maybe it will be useful for you. We use Transifex as well.
Thanks for the tips. I can see that we're planning to implement the same solution (especially after simplifying the t()
calls), so we're going in the right direction :)
So for the clarification @Reinmar @wwalc I'm going to implement errors after spec update for the following cases:
t()
call parameters without relevant translation key in package/lang/contexts.json
and core/lang/contexts.json
,contexts.json
without relevant parameter t()
call,contexts.json
more than once for the same translation key.Sounds good to me.
I have created working version of collecting translations from contexts and t()
calls with error handling.
Script requires branches https://github.com/ckeditor/ckeditor5/tree/t/387 and https://github.com/ckeditor/ckeditor5-dev/tree/t/44. Also it requires latest masters of other ckeditor5-* packages.
Usage:
gulp translations:collect
Expected result: following logs and ckeditor5.pot
with English strings and contexts.
[11:25:33] Using gulpfile ~/data/www/ckeditor/ckeditor5/gulpfile.js
[11:25:33] Starting 'trans:collect'...
[11:25:34] Created file: /home/maciek/data/www/ckeditor/ckeditor5/build/.transifex/ckeditor5.pot
[11:25:34] Finished 'trans:collect' after 528 ms
Just please call the task translations:collect
;)
Another proposal: i18n:collect
.
i18n is about more than translations, so it's not very precise.
Ok. Thanks. It's fixed. now there is a task called translations:collect
.
And the output file is now ckeditor5.pot
, not english.po
.
please, use a standard to define the language files e translations files, like ISO639-1 (or variations), IETF BCP47
@luzfcb, we'll use formats supported by gettext. This is a very common standard so there will be no problem with interoperability if that's what you meant.
@luzfcb https://www.gnu.org/software/trans-coord/manual/gnun/html_node/PO-Header.html - there is even utf-8 in the header example .
I think @luzfcb is talking about the name of the language files. We'll definitely use ISO 639-1 for that (e.g. English is "en"). I was also scared about the previous "english.po" name :)
Actually, to be more precise, it'll be IETF with ISO 639-1.
@Reinmar @ma2ciek Sorry if I was not clear enough. The standard I'm talking about is how to name the translation files as @fredck mentioned.
I had a problem with select2, because select2 does not use a known standard for naming translation files. This made it difficult for me to set up an equivalence table between the standard used by Django Framework and the file names used by select2, to force select2 use the language selected by user.
Django uses gnu gettext and store the translations files (.po and .mo) inside a "locale" directory, like https://github.com/django/django/tree/master/django/contrib/admin/locale These files are dynamically found by the translation functions.
https://docs.djangoproject.com/en/1.10/topics/i18n/ https://docs.djangoproject.com/en/1.10/topics/i18n/translation/
@luzfcb Thanks for the clarification. We'll use ISO 639-1 standard.
I'm working now on the 3. and 4. step (download and translation maps). I have chosen https://github.com/smhg/gettext-parser to parse PO files (project has 40 stars, but It's still maintained and the most popular po to js parser I have found). And now I'm testing the Transifex API on the simple project to automate 3.
We decided with @Reinmar to upload one file per package to get easy download process.
Otherwise there is a problem with splitting one big translation file into many packages accordingly to their contexts.json
files.
It's important to remember that all translation keys are unique across the contexts.json
files, so they won't be duplicated on Transifex.
We have a problem with implementing the plugin for Webpack which would be able to generate modules with translations on the fly, based on discovered (imported) modules. We asked a question on StackOverflow if you want to help: http://stackoverflow.com/questions/41869498/how-to-write-webpack-plugin-which-adds-modules-to-the-bundle-on-the-fly-based-on.
Any help would be greatly appreciated!
Ok. So for now there will be only single language support. We'll add the multi-language feature in the future when the problem will be resolved.
There are exported 3 main tasks in the ckeditor5
package:
gulp translations:collect
gulp translations:upload -u user-p password
gulp translations:download -u user -p password
First task creates .pot
files from the t()
calls for each package that contains 'contexts.json', checks for the errors and saves these .pot
files in the build/.transifex directory.
Second task uploads .pot
files on the Transifex. It creates resource if the package translations are uploaded for the first time, otherwise it updates existing resource.
Third task only downloads all translations in .po
file format from the Transifex and saves them in the [packageName]/lang/translations/[language_code].po format.
After translation downlaod, the CKEditor5WebpackPlugin will be able to take languages
option:
plugins: [
new CKEditorWebpackPlugin( {
languages: [ 'pl' ]
} )
]
During the compilation plugin is replacing first parameter of the t()
calls with the translated strings in the bundled source of ckeditor5.
It should be noted, that using function named define()
throws errors because of the https://github.com/ckeditor/ckeditor5-dev/issues/59 issue.
Scripts require https://github.com/ckeditor/ckeditor5-utils/tree/t/116, https://github.com/ckeditor/ckeditor5/tree/387 and https://github.com/ckeditor/ckeditor5-dev/issues/44.
Great to see progress on this :)
A couple of questions:
Third task only downloads all translations in .po file format from the Transifex and saves them in the [packageName]/lang/translation/[language_code].po format.
I'm not sure about this path. The spec said translations should be placed in lang/
directly. I guess you wanted to separate those po
files from contexts.json
and en.pot
, I guess it makes sense. However, then the correct path would be lang/translations/
.
Scripts require https://github.com/ckeditor/ckeditor5-utils/tree/t/116, https://github.com/ckeditor/ckeditor5/tree/387 and ckeditor/ckeditor5-dev#44.
Is any of these on review two? Please make PRs in every repo if those branches are supposed to be reviewed together.
I've been checking the state of the env utils and I found this: https://github.com/ckeditor/ckeditor5-dev/compare/t/44#diff-71629f5ae239e9d11642ad3fec3911c1R33.
This is not going to work in practice. The t()
call may be formatted in various ways and used in at least two contexts (after space and after tabulation), but that happens only in our code style. If different code style is used, there are many more variations. Besides, it's going to break with a simple string containing ")" character.
Let's keep this code for now but it must be improved. It must be using a proper JS parser.
EDITED
I'm not sure about this path. The spec said translations should be placed in lang/ directly. I guess you wanted to separate those po files from contexts.json and en.pot, I guess it makes sense. However, then the correct path would be lang/translations/
Yes, I made a typo. Translations are saving under the lang/translations/
path.
Is any of these on review two? Please make PRs in every repo if those branches are supposed to be reviewed together.
I'll fix few issues and I'll create a pull request within an hour.
I've been checking the state of the env utils and I found this: https://github.com/ckeditor/ckeditor5-dev/compare/t/44#diff-71629f5ae239e9d11642ad3fec3911c1R33. This is not going to work in practice. The t() call may be formatted in various ways and used in at least two contexts (after space and after tabulation), but that happens only in our code style. If different code style is used, there are many more variations. Besides, it's going to break with a simple string containing ")" character.
I had to misunderstand implementation of this part. I hope it's not going be hard to add babel or similar parser here to find t()
call occurrences. There is already a Webpack parser that solves this issue: https://github.com/webpack-contrib/i18n-webpack-plugin/blob/master/index.js, but it won't be able to share code logic across Webpack and Rollup plugins.
After a week I improved the t()
calls find and replace using the Acorn parser and modified version of Escodegen generator (which master do not use latest version of Estraverse and throws errors during export default
declaration parsing). So for now ckeditor5-dev-packages
Uses the fork of the Escodegen
. I'll report that issue.
I changed the way the .po
files are being imported from the packages, but it turned out, that this part has to be explicitly declared in the CKEditorWebpackPlugin
options. The ckeditor5-core
package which is the default translation package must be resolved first to provide translations that other packages may use.
It took quite a lot of time compare to the effects, mainly because of the researching in out of date webpack plugins and webpack documentation. I hope it'll change soon.
I changed the way the .po files are being imported from the packages, but it turned out, that this part has to be explicitly declared in the CKEditorWebpackPlugin options. The ckeditor5-core package which is the default translation package must be resolved first to provide translations that other packages may use.
How does the process look like? Is it split into two parts:
or not?
If it is, then I don't see a problem in sorting the translations in a way that the core will always be first. Besides, why does it have to be first in the first place? There should be no conflicts between translations provided by packages anyway.
After a week I improved the t() calls find and replace using the Acorn parser and modified version of Escodegen generator (which master do not use latest version of Estraverse and throws errors during export default declaration parsing). So for now ckeditor5-dev-packages Uses the fork of the Escodegen. I'll report that issue.
How's that possible that Escodegen isn't supporting all ES6 yet? How's Webpack working with ES6? Is it always pre-transpiling the modules to CJS or some custom format?
How does the process look like? Is it split into two parts: discovering paths to packages, modifying the files or not?
The process isn't split. For each js file in ckeditor5-* the po
file of the package is being added (if wasn't added earlier) and then the file is being translated. So the package can't use ckiedtor5-core
translations which may be defined later.
The process isn't split.
Can't we split it? ;< Or somehow locate and add ckeditor5-core automatically?
We can add manually ckeditor5-core
. But we have to know where the package physically is. So this should be written explicitly in CKEditorWebpackPlugin options.
Can't we split it?
The loader is used just after resolver for each file. So this part can't be separate if we don't know paths to the po
files in the packages first.
EDITED: See https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271600724
Process
t()
functions in which we define the English value of a string (and, optionally, a context). E.g.t( 'Bold' )
ort( 'Button [context: clothing]' )
(the[context: clothing]
will be automatically removed upon build on, in the dev mode, on runtime). Each context must be defined inlang/contexts.json
of a package which uses it or in theckeditor5-core/lang/contexts.json
(for common strings).t()
usages, extract strings from it, builds a contexts map (based on alllang/contexts.json
files) and checks whether all used strings have defined contexts. Then builds a PO file with English strings and contexts and upload it to Transifex.lang/
directories.Implementation
ckeditor5-core/src/editor~Editor#constructor
I propose to merge the ctx name into the string in order to keep the same
t()
params in every possible environmentt( str, values )
. If it was a separate param, then in the build version (in which we replace strings with ids) it would have to be left unused, or, we'd need to change the implementation oft()
on the fly.The CKE_LANG will be defined by the bundler. It will be the editor class that's going to set it which means that it will only need to be set when bundling for the browser environment. Or we could go a bit further than that and define utils/global object which would retrieve the global scope depending on the env. In the browser that would be a window, in Node.js that would... something which is global there. That would allow this code to work without the bundler needing to preset this value.
PS. we already have
utils/dom/global
, but I think that it makes sense to keep them separated.ckeditor5-*/lang/contexts.json
Contexts for used messages.
Examples: https://github.com/ckeditor/ckeditor-dev/tree/master/dev/langtool/meta
ckeditor5-core/lang/contexts.json
:ckeditor5-form/lang/contexts.json
:ckeditor5-tailor/lang/contexts.json
:The button is first defined in the
ckeditor5-form
package without a context, because, e.g. historically, we could've used it there without a context (cause, in our case, button is a UI button most of the time). Then, while working on the CKEditor 5 Tailor plugin we realised that button is already used, but in a different context, so we can't uset( 'button' )
as it will point to a wrong context definition (contexts are global for all packages). Instead, we'll uset( 'button [context: clothing] ' )
and add its own definition in theckeditor5-tailor/lang/contexts.json
.ckeditor5-utils/src/locale
No magic here – uses the
translate()
function of theutils/translation-service
module to get translated string and replaces placeholder values ($1
,$2
, etc.) with passed args.ckeditor5-utils/src/translation-service
The goal of this module is to encapsulate the whole "translations repository" logic from the rest of the code.
It may be dirty with some preprocessor rules or may need to be generated on the fly depending on the build type (e.g. with or without code splitting – i.e. with separate or built in language file). However, it would be good if it was defined in the repository so for dev purposes we wouldn't have to do anything. It'd just return the passed string.
Development mode implementation:
For the bundles, we have two cases:
In case of just one language, it'll be best to simply replace the
str
param oft()
calls with the proper value (without the ctx now). This will allow for code splitting and hot-loading plugins without any tricks. It may happen, though, that some strings will then repeat multiple times – e.g. the ones from the core package. While this is going to make an un-gzipped package bigger, there shouldn't be a difference in case of a gzipped code. Besides, this will be just a few strings anyway and we'll save some space having such a simple implementation too.In case of multiple languages we need to have some registry. The
translate()
implementation will be again simple:What's the
str
in this case? In order to ensure that we don't have name collisions (important for bundle splitting) I'd say that this should be either:A totally unique string – like typical uid, but preferably shorter, using a wider range of unicode characters. A 5 chars long string using a range of all Unicode chars will give us comparable complexity to
utils/uid
:The thing which worries me is whether there won't be any issues with encoding – we've seen people sourcing CKEditor in some weird encodings and it was always blowing up. But one could use a minifier which encodes special characters and it was be working again.
Another thing is ability to debug such code. With unreadble ids it may be tricky.
Finally – creating objects in which these uids are keys may be tricky. I can't find now whether there are any characters which needs to be escaped (other than the closing quote).
Anyway, this solution may create unpredictable and stupid issues.
[a-z]
). What about code splitting? There are two cases:ckeditor5-preset-article-editor
and then, separately (might be a different person), builtckeditor5-image
. In this case, the preset bundle will use normal, short, ids (because it's the main bundle) and the image feature bundle will use prefixed ids (e.g.ckeditor5-image/<id>
). The idea is that a developer releasing his/her package will use a special bundler setup which configures CKEditor plugin for Webpack to use prefixed ids.Anyway, this is nothing we have to worry today, because, most likely, we'll work on releasing standalone package bundles after 1.0.0.
Another thing to notice is that support for multiple languages and code splitting is an optional feature, so we can implement it for just one bundler, i.e. Webpack.
Let's say, that we want to split a package to some preset X and packages Y and Z. This will make for 3 files –
x.js
,y.js
,z.js
.The idea is that each files will have an accompanying language(s) files defining a translations needed for this file. So there will be:
x.js
,x-pl.js
,x-en.js
, ...y.js
,y-pl.js
,y-en.js
, ...z.js
,z-pl.js
,z-en.js
, ...In order to run an the X preset with plugins with Polish translations one will need to load:
x.js
,y.js
,z.js
,x-pl.js
,y-pl.js
,z-pl.js
.The language files will be built using entry points like this:
And the
define()
function will merge these translations into other that it already has.Tadam!
PS. PO file generation
We've been changing the idea which part of
t()
call ismsgctxt
,msgid
andmsgstr
twice already, so let's clarify this:For the following
t()
calls andcontexts.json
:ckeditor5-core/lang/contexts.json
:ckeditor5-form/lang/contexts.json
:ckeditor5-tailor/lang/contexts.json
:The
en.po
file would look like this:In this case the
msgstr
can be empty because thengettext
will usemsgid
. In fact, in the samples I found in https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271349641 it's empty.Regarding https://github.com/ckeditor/ckeditor5/issues/387#issuecomment-271349641:
We ignore this issue. It's very rare situation.