drdhaval2785 / SanskritSpellCheck

spell checking based on patterns
1 stars 1 forks source link

Generation of Non-Sandhi Patterns #3

Closed gasyoun closed 9 years ago

gasyoun commented 9 years ago

Somehow similar to https://github.com/drdhaval2785/SanskritSpellCheck/issues/1 only in reverse order. Sandhi pattern table I have based on Buehler, see page 2.

кон-ечн. + начальные a- ā- k- kh- g- gh- - c- ch- j- jh- ñ- ṭ- ṭh- ḍ- ḍh- ṇ- -k ga gā == == gg ggh g == == gj gjh gñ == == == == gḍ gḍh gn  ñ ṇ -ṭ ḍa ḍā == == ḍg ḍgh ḍ == == ḍj ḍjh dñ == == == == ḍḍ ḍḍh ḍṇ ṇ nñ ṇṇ -p ba bā == == bg bgh b == == bj bdh bñ == == bḍ bḍh bṇ m mñ mṇ -t da dā == == dg dgh d cc cch jj jjh jñ ṭṭ ṭṭh ḍḍ ḍḍh dṇ n ññ ṇṇ - a ā == == == == == == == == == == == == == == == a ā
-ṇ ṇṇa ṇṇā == == == == == == == == == == == == == == == ṇa ṇā
-n nna nnā == == == == == ṃçc ṃçch j jh ññ ṃṣṭ ṃṣṭh ṇḍ ṇḍh ṇṇ na nā
-m == == ṃk ṃkh ṃg ṃgh ṃ ṃc ṃch ṃj ṃjh ṃñ ṃṭ ṃṭh ṃḍ ṃḍh ṃṇ (k) (kh) (g) (gh) () (ñc) (ñch) (ñj) (ñjh) (ññ) (ṇṭ) (ṇṭh) (ṇḍ) (ṇḍh) (ṇṇ) -(i) ḥ ra rā ḥk ḥkh rg rgh r çc çch rj rjh rñ ṣṭ ṣṭh rḍ rḍh rṇ (kḥk) (kḥkh)
-aḥ -o ’ -a ā- == == og ogh o açc açch oj ojh oñ aṣṭ aṣṭh oḍ oḍh oṇ -āḥ -ā a- -ā ā- == == āg āgh ā āçc āçch āj ājh āñ āṣṭ āṣṭh āḍ āḍh āṇ

a-  ā- k-  kh- g-  gh- -    c-  ch- j-  jh- ñ- ṭ-    ṭh-   ḍ-    ḍh-   ṇ-

But what I need is a list of impossible combinations. Impossible in Sanskrit grammar, but not in our dirty Sanskrit files. Afterwards we could do search and replace, to see if there is dirt left in OCRs. The question of ligatures relies on sandhi a lot. But I do not have a list of 1) all known sandhi solutions (other than my file Sandhi-Table-19.10.13 based on Buehler's table) 2) all imposible sandhis, like we can't have tk, but only dg. I want a list of tk-like combinations, that can not occur inside a word. Ore even a separate list for sentence sandhis.

Shalu411 commented 9 years ago

Namaste The list is possible. Great idea. :+1:
One simple step - principle wise approach is - varga paJchamas case. They cannot be interchanged with another varga than their own. Eg. nT, ng etc. Will take a little time to generate full list. Note or Request- :) Please do not use Russian in between.

gasyoun commented 9 years ago

If Russian is there, let it be. File for a Russian book, that you hope will get closer to in 2015.

Shalu411 commented 9 years ago

Russian for a book - let it be in book. Russian for a Russian web site.. - let it be there. Not here.. We write for others' understanding.. and lets make it easy- not difficult or impossible for them. Just transliterate or translate things before posting. Please. Its a request.

drdhaval2785 commented 9 years ago

I have already noted elsewhere that we don't need to have list of impossible combinations. All our patterns get better with time and corrections. Whatever remains after that it really the list of possible combinations. (<1% of all permutations / combinations). If you need a list of impossible combination we will do impossible=total-(possible). As most of the dictionaries also have samAsa words in them, the samAsa patterns are taken care of.