drdhaval2785 / SanskritSorting

Codes written by Dr. Dhaval Patel for Sanskrit Natural Language Programming
2 stars 1 forks source link

Accent and special character "°*([-" sorting order #3

Open gasyoun opened 9 years ago

gasyoun commented 9 years ago

Homographs in the accented list can look like: akza/ a/kza akza akza The original order in https://github.com/sanskrit-lexicon/PWG/ is exactly as above. Do we want to keep it in the reverse sorting? I guess we want it to have reversed as well: akza akza a/kza akza/

There can occur several other combinations. Seven possible headword variations in PW(K): a° a a/MSa aMSaka aMSakaraRa (a/Msya) ukTya^ MW has -, --, --- and SCH has [ in addition in key2.

drdhaval2785 commented 9 years ago

What if we ignore accents in sorting? There is no order prescribed for accents. The way we do our sorting is ->

  1. I remember the original entry as array 1
  2. I copy that array 1 as array 2.
  3. I prepare an array with entries (array1[ ],array2[ ]).
  4. Then I alter only array2 entries.
  5. I sort according to array2.
  6. When it comes to displaying I display array1 only.

So, we can preserve the accent marks etc in the way they actually are.

gasyoun commented 9 years ago

Ignoring does not sounds good. Can we have it akza akza a/kza akza/ this ways, in this order - sample is not reverse. 1) form marked with \ - theoretically possible form, not attested 2) accentless form 3) accent closer to the beginning of word 4) accent closer to the end of word

Check http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/webtc/servepdf.php?page=1-0013 to see that we should not ignore accents, as in printed books there is an order.

akza

drdhaval2785 commented 9 years ago

Marcis says that he will be adding proper accent marks, so this issue doesnt survive as of now. In case of difficulty we will pursue this issue

gasyoun commented 9 years ago

@drdhaval2785 Sure I will add them (you actually convert them), but the order is still an issue. The above documented cases are not enough to show that words with accent in the beginning come first in the list in the usual sorting of words, so in reverse should be vice versa?

drdhaval2785 commented 9 years ago

The current update is - reverse22.php sorts the accents along with the normal unaccented form. So separate accented and unaccented forms dont occur.

@gasyoun The next issue which remains is the ordering of accents

gasyoun commented 9 years ago

Can we please implent https://github.com/drdhaval2785/SanskritSorting/blob/master/slp-iast-withaccent.php functions into https://github.com/drdhaval2785/SanskritSorting/blob/master/reverse22.php so that headword list extracted from MW does not look like

nañ
ā/
ā
yācñā/
abhi-yācñā
jñā/
jñā

but is ready for the print, with accent added? Now I can't use https://github.com/drdhaval2785/SanskritSorting/blob/master/slp-iast-withaccent.php after https://github.com/drdhaval2785/SanskritSorting/blob/master/reverse22.php because SLP1 is lost (for good), but the / old accent markup left (for bad). And who is #nañ#?

drdhaval2785 commented 9 years ago

input / output please