FLOIP / flow-spec

7 stars 6 forks source link

Language identifiers and details: Support more than one language per ISO 639-3 code #48

Closed markboots closed 3 years ago

markboots commented 3 years ago

The first draft of Flow Spec used bare ISO 639-3 codes to identify languages. This was assumed possible since ISO 639-3 is supposed to contain every human written and spoken language.

However, what the consortium realized in discussions over time is that 639-3 alone doesn't make sense as a "primary key" for languages in flows, because:

There is also additional data that might be useful to associate with languages, beyond the ISO 639-3 code; an example is the BCP 47 language+locale, useful often in conjunction with speech recognition and synthesis tools, etc.

Possible solution:

languages: [
   {
      id: "eng-male",
      label: "English - Male Voice",
      iso_639_3: "eng",
      variant: "male",
      bcp_47: "en-US"
   },
   {
      id: "eng-female",
      label: "English - Female Voice",
      iso_639_3: "eng",
      variant: "female",
      bcp_47: "en-US"
   },
   {
      id: "mis-mysecretlanguage",
      label: "My secret code language",
      iso_639_3: "mis",
      variant: "mysecretlanguage",
      bcp_47: null
   },
   {
      id: "fre",
      label: "Francais",
      iso_639_3: "fre",
      variant: null,
      bcp_47: fr-FR
   }
]

The language id would be a string, but the recommended format for language IDs would be: