cibernox / svelte-intl-precompile

I18n library for Svelte.js that analyzes your keys at build time for max performance and minimal footprint
https://svelte-intl-precompile.com
ISC License
274 stars 13 forks source link

languageFileRegex is not compatible with all BCP 47 language tags #49

Closed brawaru closed 2 years ago

brawaru commented 2 years ago

Hello! Recently we have added a few custom languages to our project, and to stay consistent, we assigned them to well-formed but invalid language tags (to avoid any collisions) — e.g. en-LOLCAT (this is also non-canoncial apparently, lolcat should be lowercase, but oh well). We have found that browsers do not have any troubles with such codes, falling back to en (which is expected, LOLCAT is a subset of English). However, we have found that svelte-intl-precompile has a pretty weak regular expression to detect which file is compatible for compilation, which prevents us from compiling and shipping our custom languages.

The regular expression in question is:

https://github.com/cibernox/svelte-intl-precompile/blob/1e4a3831912abdc8be0d6ddaeeba5d3305f659ec/sveltekit-plugin.js#L26

Not only does this affect our silly custom languages, this also affects many actual languages and subsets of languages defined in CLDR. You can see this on Regex101.

I have not yet looked into how this regular expression is used, but since Node.js has already enabled compilation with full ICU, wouldn't it make sense to use actual APIs like Intl.NumberFormat.supportedLocalesOf over regular expressions that can be flawed in one of way or another? Just iterate over files, check if they have compatible extensions and that Intl API considers code valid. This would probably work in our case, and in many other cases...

cibernox commented 2 years ago

Interesting, I had no idea that languages could have names other that xx-XX. This check was initially added because of #48 so one can have files that are not languages in the locales folder.

Do you know if there’s a better regular expression or a comprehensive list of all the valid language names that I can compare against?

brawaru commented 2 years ago

There's a list of available locales from CLDR, but then you also can come up with different codes on your own like en-lolcat, which is well-formed and will be accepted by Intl, but is invalid. I however still think it might be a better idea to just run through files, remove extension from a file name and check if like new Intl.Locale(name) doesn't throw an error (it will throw for any not well-formed tag, even if Intl has no data for it, compared to mentioned earlier supportedLocalesOf, which will only return locale codes that Intl has data for). Regular expression solution that I have found is flawed.

cibernox commented 2 years ago

Yeah, I'm not very keen on that because Intl.Locale takes pretty much anything.

new Intl.Locale('gitconfig') is deemed valid, which prevents it from being used to discriminate between locale files and other files that might exist on that folder for other reasons.

May I ask what on earth does en-lolcat contains?

If pretty much anything can be valid, my best guess is to, by default, make the plugin process every file and let users to define their own regexp or filter function when configuring the plugin to exclude some of them.

Something like

import precompileIntl from "svelte-intl-precompile/sveltekit-plugin";

const config = {
    kit: {
        target: '#svelte',
        vite: {
            plugins: [
                precompileIntl('locales', {
                    excludeFiles: /^(gitconfig|eslint)/
                                 })
            ]           
        }       
    }
};

export default config;

Does that sound good?

brawaru commented 2 years ago

Hm, weird, new Intl.Locale('.gitconfig') or new Intl.Locale('gitconfig') should both throw an error, but I see now, new Intl.Locale('LICENSE') will not throw... I think exclusion regex would work much better, yea.

CC @venashial for opinion: we can probably use /^(?!.+\.json$).+$/ for https://github.com/modrinth/translations?

venashial commented 2 years ago

Yep, that would work for us.

cibernox commented 2 years ago

Cool, I’ll give it a go tonight

cibernox commented 2 years ago

If you could try the latest master and verify that it works for both of you, I'll release a new version.

cibernox commented 2 years ago

I published in 0.11.0-beta.2 in case that's more convenient.