drdhaval2785 / SanskritSorting

Codes written by Dr. Dhaval Patel for Sanskrit Natural Language Programming
2 stars 1 forks source link

#tita_u# vs. #haluāṇa# #28

Closed gasyoun closed 9 years ago

gasyoun commented 9 years ago

@drdhaval2785 In MW key1 we have:

tiRwI
titau
titanizu

After sorted I get (so it's not an input issue, input is titau and not tita_u)

| u |
#u#
#tita_u#

Are you sure we want the underscore in #tita_u#? As I see 28 cases, I understand that it's about #pra_uga# as well. But https://github.com/sanskrit-lexicon/MWS/blob/master/hiatus-190-entries.txt has more cases. From my hiatus file I see that #haluāṇa# and #śrīāhnika# are unsplit, so where is the logic?

drdhaval2785 commented 9 years ago

Done. The culprit was dev-slp.php. It was made for Hindi also. So it had

$text = str_replace("a" . str_replace(" ", "", $vow['scr'][253]), "a_i", $text); $text = str_replace("a" . str_replace(" ", "", $vow['scr'][255]), "a_u", $text);

Which converted titau to tita_u.

now commented that section out.

gasyoun commented 9 years ago

Now we know that words containing words of vedic origin can conflict with Hindi, right.

drdhaval2785 commented 9 years ago

the example given by the script writer of dicrunch is ha_uk. Some persian effected word.