LBeaudoux / iso639

A fast, simple ISO 639 library.
MIT License
32 stars 4 forks source link

Malay Error #13

Closed yamavol closed 1 year ago

yamavol commented 2 years ago

I found "Malay" for "ms" is not supported. "Standard Malay" returns "zsm", but it is defined for ISO639-3 only. Is this an intended specification?

def test_malay(self):
    self.assertEqual(lang_to_iso639_pt1 ('Malay'), '')          #Exception
    self.assertEqual(lang_to_iso639_pt1 ('Standard Malay'), '') #OK(empty)
    self.assertEqual(lang_to_iso639_pt1 ('Indonesian'), 'id')   #OK
    self.assertEqual(lang_to_iso639_pt2t('Malay'), '')          #Exception
    self.assertEqual(lang_to_iso639_pt2t('Standard Malay'), 'zsm')  #Error(empty)
    self.assertEqual(lang_to_iso639_pt2t('Indonesian'), 'ind')      #OK
    self.assertEqual(lang_to_iso639_pt3 ('Standard Malay'), 'zsm')  #OK

https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes https://iso639-3.sil.org/code/msa

LBeaudoux commented 2 years ago

"Malay" for "ms" is not supported.

@yamavol thank you for reporting this issue. As of today, only ISO 639-3 and ISO 639-5 language names are supported. The other language names used by ISO 639-1 and ISO 639-2 are not supported yet.

To remedy this, maybe we could automatically redirect alternative ISO 639-1 and ISO 639-2 language names to reference ISO 639-3 and ISO 639-5 language names and get:

>>> Lang('Malay')
Lang(name='Malay (macrolanguage)', pt1='ms', pt2b='may', pt2t='msa', pt3='msa', pt5='')

"Standard Malay" returns "zsm", but it is defined for ISO639-3 only.

This is normal. In the registration authority reference file, the zsm ISO 639-3 language code is not mapped to any ISO 639-1 code. But you can still get an ISO 639-1 code by using the macrolanguage of zsm:

>>> Lang('zsm').macro()
Lang(name='Malay (macrolanguage)', pt1='ms', pt2b='may', pt2t='msa', pt3='msa', pt5='')
>>> Lang('zsm').macro().pt1
'ms'
yamavol commented 2 years ago

OK, I understood the situation. I didn't know "Malay" is not on the ISO639-3 list. I liked your remedy. It would be a nice feature when converting language names to iso codes. Anyway, thank you for the reply.