faker-js / faker

Generate massive amounts of fake data in the browser and node.js
https://fakerjs.dev
Other
11.92k stars 880 forks source link

feat(locale): add Serbian (Latin) language #1801

Closed bdimitrovski closed 1 year ago

bdimitrovski commented 1 year ago
bdimitrovski commented 1 year ago

@import-brain Done, the tests should be passing now.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1801 (93e123f) into next (93e1433) will decrease coverage by 0.01%. The diff coverage is 100.00%.

:exclamation: Current head 93e123f differs from pull request most recent head 6a5c268. Consider uploading reports for the commit 6a5c268 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## next #1801 +/- ## ========================================== - Coverage 99.62% 99.62% -0.01% ========================================== Files 2354 2385 +31 Lines 236555 239261 +2706 Branches 1227 1226 -1 ========================================== + Hits 235673 238363 +2690 - Misses 860 876 +16 Partials 22 22 ``` | [Impacted Files](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js) | Coverage Δ | | |---|---|---| | [src/locale/index.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZS9pbmRleC50cw==) | `100.00% <100.00%> (ø)` | | | [src/locale/sr\_RS\_latin.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZS9zcl9SU19sYXRpbi50cw==) | `100.00% <100.00%> (ø)` | | | [src/locales/index.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvaW5kZXgudHM=) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/cell\_phone/formats.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vY2VsbF9waG9uZS9mb3JtYXRzLnRz) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/cell\_phone/index.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vY2VsbF9waG9uZS9pbmRleC50cw==) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/date/index.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vZGF0ZS9pbmRleC50cw==) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/date/month.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vZGF0ZS9tb250aC50cw==) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/date/weekday.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vZGF0ZS93ZWVrZGF5LnRz) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/index.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vaW5kZXgudHM=) | `100.00% <100.00%> (ø)` | | | [src/locales/sr\_RS\_latin/internet/domain\_suffix.ts](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js#diff-c3JjL2xvY2FsZXMvc3JfUlNfbGF0aW4vaW50ZXJuZXQvZG9tYWluX3N1ZmZpeC50cw==) | `100.00% <100.00%> (ø)` | | | ... and [20 more](https://codecov.io/gh/faker-js/faker/pull/1801?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js) | | ... and [11 files with indirect coverage changes](https://codecov.io/gh/faker-js/faker/pull/1801/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js)
bdimitrovski commented 1 year ago

@matthewmayer @ST-DDT when could we expect this to land?

matthewmayer commented 1 year ago

Probably best to use your own fork if you are relying on its functionality in the short term.

https://blaipratdesaba.com/how-to-use-an-npm-node-module-that-has-been-forked-b7dd522fdd08

Even when it lands it will only be going into 8.0.0 which is still in alpha.

ST-DDT commented 1 year ago

We are currently busy and thus are a little bit slower with the reviews. AFAIK neither of us has any Serbian skills, so it takes a bit of time for us to review. We will review/merge it when we have time, probably soon. You can speed up the process by getting reviews from a third-party.

As for a release, we might release another v8 alpha in the next month (not sure yet), but the actual v8 stable release will take at least one or two months.


Also I noticed, that there are two "Serbian" languages, that seem to have the same locale code.

sr-RS@latin + sr-RS (Russian characters + some more?)

https://github.com/validatorjs/validator.js/blob/master/src/lib/alpha.js#L24-L25

I'm not sure what the correct locale code would be here. Can you give us some insides on this issue? Should we use sr_RS_latin as locale code for clarification?

bdimitrovski commented 1 year ago

We are currently busy and thus are a little bit slower with the reviews. AFAIK neither of us has any Serbian skills, so it takes a bit of time for us to review. We will review/merge it when we have time, probably soon. You can speed up the process by getting reviews from a third-party.

As for a release, we might release another v8 alpha in the next month (not sure yet), but the actual v8 stable release will take at least one or two months.

Also I noticed, that there are two "Serbian" languages, that seem to have the same locale code.

sr-RS@latin + sr-RS (Russian characters + some more?)

https://github.com/validatorjs/validator.js/blob/master/src/lib/alpha.js#L24-L25

I'm not sure what the correct locale code would be here. Can you give us some insides on this issue? Should we use sr_RS_latin as locale code for clarification?

That's alright, I understand that you're busy, however, these are mostly just Serbian names, cities and some addresses, nothing fancy.

You can use the sr_RS_latin code, as we also have the Cyrilic alphabet, however, I have not translated it on this occasion. I can add it in another iteration, however, Latin would be more of a priority at this point.

matthewmayer commented 1 year ago

seems sr_rs_latn and sr_rs_cyrl (or sr_latn_rs,sr_latn_cyrl) are more common in various software which distinguishes the two

matthewmayer commented 1 year ago

bit of background https://ciklopea.com/blog/localization/should-you-localize-to-serbian-latin-or-to-serbian-cyrillic/

@bdimitrovski one thing i wasn't totally clear on: can you do a perfect 1-1 transliteration between the Latin and Cyrillic scripts? ie a computer should be able to translate character by characters? Or does it require some context/knowledge

ST-DDT commented 1 year ago

seems sr_rs_latn and sr_rs_cyrl (or sr_latn_rs,sr_latn_cyrl) are more common in various software which distinguishes the two

Do you have a source for this?

My search results show otherwise:

https://www.google.com/search?q=%22sr_RS_Latin%22 => 747.000 https://www.google.com/search?q=%22sr_RS_Latn%22 => 55 https://www.google.com/search?q=%22sr-RS-Latin%22 => 5260 https://www.google.com/search?q=%22sr-RS-Latn%22 => 1040

Also I found this entry from the rust sources:

https://docs.rs/pure-rust-locales/0.2.1/x86_64-apple-darwin/pure_rust_locales/sr_RS_latin/LC_TIME/index.html

ST-DDT commented 1 year ago

This PR needs to be updated:

git fetch --all
git merge origin/next
pnpm run preflight
git add .
git merge --continue
bdimitrovski commented 1 year ago

This PR needs to be updated:

git fetch --all
git merge origin/next
pnpm run preflight
git add .
git merge --continue

Done.

ST-DDT commented 1 year ago

Should we merge this as is or should we rename it to sr_RS_latin?

Shinigami92 commented 1 year ago

Should we merge this as is or should we rename it to sr_RS_latin?

A little bit of googling showed that sr_RS_latin is not a bad idea and theoretically "valid" If another one also says yes, lets do it :+1:

bdimitrovski commented 1 year ago

@matthewmayer @ST-DDT @Shinigami92 is this ready to go then?

ST-DDT commented 1 year ago

We will discuss the locale key tomorrow in our team meeting.

bdimitrovski commented 1 year ago

sr_RS_latin

@ST-DDT How/where would I change that? Which file(s) should be altered?

matthewmayer commented 1 year ago

Basically

  1. rename the whole src/locales/sr_RS to src/locales/sr_RS_latin
  2. do a global text replace from sr_RS to sr_RS_latin
  3. run pnpm run preflight to make sure everything's OK
bdimitrovski commented 1 year ago

Basically

  1. rename the whole src/locales/sr_RS to src/locales/sr_RS_latin
  2. do a global text replace from sr_RS to sr_RS_latin
  3. run pnpm run preflight to make sure everything's OK

@matthewmayer Sure thing, that's what my thinking was too. But should I do that even in automatically generated files, like these?

Screenshot 2023-03-17 at 11 40 11

At this point, the sr_RS does not exist anymore, since I've renamed it, but when I run pnpm run preflight it still generates that.

matthewmayer commented 1 year ago

Basically

  1. rename the whole src/locales/sr_RS to src/locales/sr_RS_latin
  2. do a global text replace from sr_RS to sr_RS_latin
  3. run pnpm run preflight to make sure everything's OK

@matthewmayer Sure thing, that's what my thinking was too. But should I do that even in automatically generated files, like these?

Screenshot 2023-03-17 at 11 40 11

doesn't really matter, they'll get regenerated by pnpm run preflight anyway.

bdimitrovski commented 1 year ago

@matthewmayer @ST-DDT all done.

bdimitrovski commented 1 year ago

bit of background https://ciklopea.com/blog/localization/should-you-localize-to-serbian-latin-or-to-serbian-cyrillic/

@bdimitrovski one thing i wasn't totally clear on: can you do a perfect 1-1 transliteration between the Latin and Cyrillic scripts? ie a computer should be able to translate character by characters? Or does it require some context/knowledge

Sorry, I've only seen this now - as for the transliteration, yes, it is possible - you can see it in action here https://www.lexilogos.com/keyboard/serbian_conversion.htm

Serbian language is the only European language to use a 2 way lettering system (otherwise known as full synchronic digraphia), the Cyrillic letter is in formal use (government, documents, etc), while informally, almost everyone uses Latin, in everyday communication.

Automatic transliteration to Cyrillic of the work that I've done for the Latin letter, is possible (again, referring to e.g. lexilogos.com):

Screenshot 2023-03-17 at 14 41 30