fnielsen / ordia

Wikidata lexemes presentations
https://ordia.toolforge.org
Apache License 2.0
24 stars 13 forks source link

Updated Punjabi language labels #137

Closed bgo-eiu closed 2 years ago

bgo-eiu commented 2 years ago

On the text to lexemes page, this changes "Eastern Punjabi" to "Punjabi Gurmukhi" and "Western Punjabi" to "Punjabi Shahmukhi." These language can only be understood to refer to these two different scripts for written Punjabi rather than any regional variation, and Punjabi lexemes typically include both on each head word as spelling variants. This would just make these options a bit easier to find in the menu when I add lexemes.

fnielsen commented 2 years ago

I am confused.

The ISO 639-3 language code "pnb" is "Western Panjabi" according to https://iso639-3.sil.org/code/pnb

"pa" is "Panjabi; Punjabi" according to https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pa

Are these languages (dialects?) related to each writing system. Perhaps non-English labels are not correct? @Daniel-Mietchen Daniel-Mietchen

Daniel-Mietchen commented 2 years ago

@fnielsen @bgo-eiu: perhaps we should go for <option value="pa"{% if text_language == 'pa' %} selected="selected"{% endif %}>ਪੰਜਾਬੀ - pa - Eastern Punjabi (Gurmukhi)</option>

and

<option value="pnb"{% if text_language == 'pnb' %} selected="selected"{% endif %}>پن٘جابی - pnb - Western Punjabi (Shahmukhi)</option>

bgo-eiu commented 2 years ago

So "Eastern Punjabi" and "Western Punjabi" as defined by ISO/SIL is essentially a non-existent distinction. As you can see from the pending change request submission linked on the SIL page for "pnb," attempts have been made to merge them, but they haven't budged on this for unknown reasons (see https://iso639-3.sil.org/sites/iso639-3/files/change_requests/2020/2020-019.pdf). De facto, these two language codes are used to distinguish the two scripts for the same language and on Wikimedia this is always how they have been used. Punjabi lexemes feature both scripts on the same page using these two codes. (Bizarrely, Microsoft Windows aliases "pnb" to "ar-sa" or "Saudi Arabic" for one of its keyboard layout options despite also having two other separate keyboard layouts for each script used in Punjabi under "pa-IN" and "pa-PK," for an interesting example of how confused this situation has become.) From what I can gather, someone in charge of these language codes mistakenly thinks that Punjabi speakers in India and Pakistan speak different languages or dialects split along that border, but this is plainly untrue, especially when you consider the fact that a substantial portion of Pakistanis were born in India and vice versa.

Punjabi Gurmukhi and Punjabi Shahmukhi are not really non-English labels as they are used in English writing on the language, and a large portion of Punjabi speakers are speakers of South Asian English. Using cardinal directions to distinguish the two scripts is confusing for multiple reasons. There are actual dialectal variations in Punjabi which can be described as eastern or western, but the distribution of their speakers is not coincident with the boundaries of India and Pakistan, nor are they coincident with use of one script or the other. Further, there are dialectal variants used in Western Punjab which make phonetic distinctions which can be only distinguished when written in the Gurmukhi script, so having to select "Eastern Punjabi" to enter a dialectal lexeme used in Western Punjab feels a bit absurd.

Daniel-Mietchen commented 2 years ago

@bgo-eiu Thanks for the additional information, on the basis of which I think your proposed change makes sense. Perhaps we should add an HTML comment about the ISO names or simply a pointer to this discussion?

bgo-eiu commented 2 years ago

Yes, that sounds like a good idea. The comment could point to this discussion, and also the ticket on phabricator regarding the two scripts which has been open since 2015 https://phabricator.wikimedia.org/T97884

fnielsen commented 2 years ago

The patch is now running: https://ordia.toolforge.org/text-to-lexemes

I am wondering whether the current display is ok?

Daniel-Mietchen commented 2 years ago

Looks good.

bgo-eiu commented 2 years ago

Looks good, thank you! The text to lexemes works properly for Gurmukhi now as well.