LBeaudoux / iso639

A fast, simple ISO 639 library.
MIT License
32 stars 4 forks source link

Javanese language code - jw #21

Open jordimas opened 6 days ago

jordimas commented 6 days ago

Hello.

It may be worth considering adding "jw" as alias for Javanese

I found this in OpenIA Whisper code:

https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L108

I will expect "jv"

I found documented here: https://xml.coverpages.org/iso639a.html

Javanese is rendered as "jw" in table 1, while it is correctly given as "jv" in the other tables.

It seems that may be an error that propaged. I have not done an extensive research, I just sharing what I found.

Thanks

LBeaudoux commented 6 days ago

Hi Jordi,

Thanks for reporting this issue. According to the ISO 639-2/RA Change Notice, the 'jw' identifier was indeed published in error and then deprecated in August 2001.

iso639-lang already detects deprecated ISO 639-3 identifiers. After the next update it will also detect deprecated ISO 639-3 reference names. Following your report, I will try to make it detect deprecated values from ISO 639-1, ISO 639-2 and ISO 639-5 as well.

jordimas commented 6 days ago

Thanks, great library BTW!