Closed rago666 closed 4 months ago
Hi, thanks a lot for the language request! That writing sample (বাংলা আমার মাতৃভাষা ।) was really helpful. I'm currently spinning up an initial implementation of Bangla text generation, but I'm a little confused about the space before the dari sign. When implementing text generation for Hindi (#6) and Nepali (#5), I never encountered this convention, and upon doing some further research, I've discovered that Microsoft's Bangla (India) Localization Style Guide doesn't recommend it either:
A punctuation mark (৷) indicating a full stop, placed at the end of declarative sentences and other statements thought to be complete. There is no space between the last letter and the period. Use one space between the period and the first letter of the next sentence.
If you think it makes sense for the extra space to be there for Bangla specifically, and not Hindi and Nepali, I'll gladly go ahead and set that up. However, for the sake of consistency across all Devanagari languages in Keypunch, I'm currently inclined to use the convention of no punctuation between the dari and its preceding word for all three of them :slightly_smiling_face:
Thank you very much. You can ignore the extra spacing before (।). I have built the app from repo using gnome builder and testing it for a few minutes. A few problems I found.
য় (z) and ড় (R) does not work in simple and advance mode.
I've discovered that Monkeytype has the exact same issue, and it's related to character representation. In the word list, those letters are stored as two characters; a base shape and a modifier character for the dot. In modern Bangla text encoding, though, those letters can also be represented as a single character that has the dot included, and that's what people usually enter on keyboards. These two representation methods are completely different letters from the perspective of the computer.
Monkeytype apparently has to represent them the former way due to technical constraints, but since Keypunch uses GTK's native text machinery instead of rolling its own, I don't think we have the same issue. So a quick fix I'll try for now is to just replace the "outdated" letters with their modern counterparts.
কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅
Both of those spellings exist in the word list. I assume that the first one should be removed? It would be good to open an issue against Monkeytype as well, then. That's where the original list is from.
I haven't looked at the numbers yet, but the other mistakes should be fixed.
This 4 words have problem, নিয়ে হয়ে দিয়ে হয়েছে. The right spelling are given below
নিয়ে ( niz[ ); হয়ে ( hz[ ); দিয়ে ( qiz[ ); হয়েছে ( hz[C[ )
Could you give it a go again now? :slightly_smiling_face:
By the way, if you'd like to , you can provide a name (and optionally a website link or an email address), and I'll credit you in the Orthography section of the about window.
Everything works perfectly now! You can close this now.
I'll credit you in the Orthography section of the about window.
I would be honored if you may include my name Arnob Goswami. Thank you for your consideration.
I'm very glad to hear that! Thank you so much for your help.
English Name
Bangla/Bengali
Native Name
বাংলা
Orthography
Bengali alphabet is derived from the Brahmi alphabet while also closely relating to the Devanagari alphabet. It is the 7th most spoken language in the world and is the official language of Bangladesh and 2nd most spoken in India.
Basics
Bengali consists of 50 letters. 11 vowels ( অ, আ, ই, ঈ, উ, ঊ, ঋ, এ, ঐ, ও, ঔ ) and 39 Consonants (ক, খ, গ, ঘ, ঙ, চ, ছ, জ, ঝ, ঞ, ট, ঠ, ড, ঢ, ণ, ত, থ, দ, ধ, ন, প, ফ, ব, ভ, ম, য, র, ল, শ, ষ, স, হ, ড়, ঢ়, য়, ৎ, ং, ঃ, ঁ).
Vowels can be found at the beginning, in the middle or in the end of the world. Example: (অলি, আউশ, সই). Same with consonants. Example: কলম -> ক, ল, ম each a consonant on different position.
Diacritics
When we join a vowel with a consonant, we use the short form of that vowel (Vowel Diacritics). This are called KAR(কার). Bengali has 10 vowel diacritics (া, ি, ী, ু, ূ, ে, ৈ, ো, ৌ, ৃ). They can be added after (সাপ), before (বিষ), below (কুটিল) or before and after consonants ( পৌর ). There are also 7 consonant diacritics, they are called PHOLA (ফলা) that can join with vowel or consonant. we use hôsôntô (্) for this operation. Example below য ফলা -> অ + ্ + য -> অ্য -> অ্যাপ্লিকেশন ব ফলা -> শ + ্ + ব -> শ্ব -> বিশ্বাস ম-ফলা ->ন + ্ + ম -> ন্ম -> তন্ময় ণ-ফলা ->হ + ্ + ণ -> হ্ণ -> অপরাহ্ণ ন-ফলা ->ত + ্ + ন -> ত্ন -> রত্ন রেফ -> র + ্ + শ-> র্শ -> বর্শ র-ফলা -> ক + ্ + র -> ক্র -> ক্রম ল-ফলা -> ল + ্ + ল-> ল্ল -> বল্লম
Consonant Conjuncts
A conjunct is a combination of two consonants. There are a lot of them. Consonant diacritics are also a form of conjuncts but not vowels diacritics are not. We write them the same way we write consonant diacritics. Example: ক্ক - ক + ্ + ক ক্ট - ক + ্ + ট ক্ষ - ক + ্ + ম
Punctuation Marks
Same as English. Once exception is we use DARI ( । ) instead of full stop (.) and space is needed before and after the sentence is finished. Example: রফিক মাছ ধরতে গিয়েছে ।
Writing
Bengali has no letter case so not capital or small letters. In linux I use the inbuilt Bangla (Probhat) layout for writing. Whatever layout it may be the writing system is almost the same. Here are some basic rules
Writing a some Bangla using Probhat (QWERTY)
বাংলা আমার মাতৃভাষা । বৃহন্নলার পাঁচ ভাই ক্ষমতার লোভে মত্ত । baLla vmar maf<BaSa . b<hn/nlar pa>c BaI k/Smfar l]B[ mf/f .
Implementation Assistance
Additional Information
No response