bragefuglseth / keypunch

Practice your typing skills
GNU General Public License v3.0
79 stars 13 forks source link

[Language Request]: Bangla/Bengali #45

Open rago666 opened 3 days ago

rago666 commented 3 days ago

English Name

Bangla/Bengali

Native Name

বাংলা

Orthography

Bengali alphabet is derived from the Brahmi alphabet while also closely relating to the Devanagari alphabet. It is the 7th most spoken language in the world and is the official language of Bangladesh and 2nd most spoken in India.

Basics

Bengali consists of 50 letters. 11 vowels ( অ, আ, ই, ঈ, উ, ঊ, ঋ, এ, ঐ, ও, ঔ ) and 39 Consonants (ক, খ, গ, ঘ, ঙ, চ, ছ, জ, ঝ, ঞ, ট, ঠ, ড, ঢ, ণ, ত, থ, দ, ধ, ন, প, ফ, ব, ভ, ম, য, র, ল, শ, ষ, স, হ, ড়, ঢ়, য়, ৎ, ং, ঃ, ঁ).
Vowels can be found at the beginning, in the middle or in the end of the world. Example: (লি, আশ, স). Same with consonants. Example: কলম -> ক, ল, ম each a consonant on different position.

Diacritics

When we join a vowel with a consonant, we use the short form of that vowel (Vowel Diacritics). This are called KAR(কার). Bengali has 10 vowel diacritics (া, ি, ী, ু, ূ, ে, ৈ, ো, ৌ, ৃ). They can be added after (সাপ), before (বিষ), below (কুটিল) or before and after consonants ( পৌর ). There are also 7 consonant diacritics, they are called PHOLA (ফলা) that can join with vowel or consonant. we use hôsôntô (্) for this operation. Example below য ফলা -> অ + ্ + য -> অ্য -> অ্যাপ্লিকেশন ব ফলা -> শ + ্ + ব -> শ্ব -> বিশ্বাস ম-ফলা ->ন + ্ + ম -> ন্ম -> তন্ময় ণ-ফলা ->হ + ্ + ণ -> হ্ণ -> অপরাহ্ণ ন-ফলা ->ত + ্ + ন -> ত্ন -> রত্ন রেফ -> র + ্ + শ-> র্শ -> বর্শ র-ফলা -> ক + ্ + র -> ক্র -> ক্রম ল-ফলা -> ল + ্ + ল-> ল্ল -> বল্লম

Consonant Conjuncts

A conjunct is a combination of two consonants. There are a lot of them. Consonant diacritics are also a form of conjuncts but not vowels diacritics are not. We write them the same way we write consonant diacritics. Example: ক্ক - ক + ্ + ক ক্ট - ক + ্ + ট ক্ষ - ক + ্ + ম

Punctuation Marks

Same as English. Once exception is we use DARI ( । ) instead of full stop (.) and space is needed before and after the sentence is finished. Example: রফিক মাছ ধরতে গিয়েছে ।

Writing

Bengali has no letter case so not capital or small letters. In linux I use the inbuilt Bangla (Probhat) layout for writing. Whatever layout it may be the writing system is almost the same. Here are some basic rules

probhat

Writing a some Bangla using Probhat (QWERTY)

বাংলা আমার মাতৃভাষা । বৃহন্নলার পাঁচ ভাই ক্ষমতার লোভে মত্ত । baLla vmar maf<BaSa . b<hn/nlar pa>c BaI k/Smfar l]B[ mf/f .

Implementation Assistance

Additional Information

No response

bragefuglseth commented 3 days ago

Hi, thanks a lot for the language request! That writing sample (বাংলা আমার মাতৃভাষা ।) was really helpful. I'm currently spinning up an initial implementation of Bangla text generation, but I'm a little confused about the space before the dari sign. When implementing text generation for Hindi (#6) and Nepali (#5), I never encountered this convention, and upon doing some further research, I've discovered that Microsoft's Bangla (India) Localization Style Guide doesn't recommend it either:

A punctuation mark (৷) indicating a full stop, placed at the end of declarative sentences and other statements thought to be complete. There is no space between the last letter and the period. Use one space between the period and the first letter of the next sentence.

If you think it makes sense for the extra space to be there for Bangla specifically, and not Hindi and Nepali, I'll gladly go ahead and set that up. However, for the sake of consistency across all Devanagari languages in Keypunch, I'm currently inclined to use the convention of no punctuation between the dari and its preceding word for all three of them :slightly_smiling_face:

rago666 commented 2 days ago

Thank you very much. You can ignore the extra spacing before (।). I have built the app from repo using gnome builder and testing it for a few minutes. A few problems I found.

  1. য় (z) and ড় (R) does not work in simple and advance mode.
  2. কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅
  3. My mistake for not mentioning it before. Bangla has it's numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) -> (০, ১, ২, ৩, ৪, ৫, ৬, ৭, ৮, ৯)। I apologize for not mentioning it in the original issue.
bragefuglseth commented 2 days ago

য় (z) and ড় (R) does not work in simple and advance mode.

I've discovered that Monkeytype has the exact same issue, and it's related to character representation. In the word list, those letters are stored as two characters; a base shape and a modifier character for the dot. In modern Bangla text encoding, though, those letters can also be represented as a single character that has the dot included, and that's what people usually enter on keyboards. These two representation methods are completely different letters from the perspective of the computer.

Monkeytype apparently has to represent them the former way due to technical constraints, but since Keypunch uses GTK's native text machinery instead of rolling its own, I don't think we have the same issue. So a quick fix I'll try for now is to just replace the "outdated" letters with their modern counterparts.

কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅

Both of those spellings exist in the word list. I assume that the first one should be removed? It would be good to open an issue against Monkeytype as well, then. That's where the original list is from.

bragefuglseth commented 2 days ago

I haven't looked at the numbers yet, but the other mistakes should be fixed.

rago666 commented 2 days ago

This 4 words have problem, নিয়ে হয়ে দিয়ে হয়েছে. The right spelling are given below

নিয়ে ( niz[ );‌ হয়ে ( hz[ ); দিয়ে ( qiz[ ); হয়েছে ( hz[C[ )