drdhaval2785 / SanskritSorting

Codes written by Dr. Dhaval Patel for Sanskrit Natural Language Programming
2 stars 1 forks source link

Sort Text in Paragraphs (Not Columns) #25

Closed gasyoun closed 9 years ago

gasyoun commented 9 years ago

Extracted from https://github.com/drdhaval2785/SanskritSorting/issues/7

@drdhaval2785 Could we please sort text with ; delimiters as well?

Input / Output

Because both as should go together: √1 as; √1 bhuj; √1 ci; √dā; √dhā; √dhāv; √gā; √gṛ; √hā; √hṛ; √kṛ (skṛ); √kṛt; √1 lī; √luṭh; √mā; √1 naś; √nu; √pā; √paś; √pat; √pṛ; √śās; √1 sidh; √śṛ; √1 stu; √1 tan; √1 ukṣ; √vā; √vap; √vid; √1 vṛ; √yu; √2 as; √aś;

Now if I have to replace ; with \n I get

| ī |
#√1 lī#

| u015b; |
#√aś;#

| ṛ |
#√śṛ#

| u1e5b) |
#√kṛ (skṛ)#

The thing is at the end I will have to convert \n back to ; but I will not get where I started, because there are several additional \n-s. So no luck. The √aś; can be manually cleaned up to √aś, but not √kṛ (skṛ) - it would have to remain such.

drdhaval2785 commented 9 years ago

tried in Whitney.php

drdhaval2785 commented 9 years ago

Provide SLP1 data.

Small sample data sorting is here

gasyoun commented 9 years ago
√1 as; √1 Buj; √1 ci; √dA; √DA; √DAv; √gA; √gf; √hA; √hf; √kf (skf); √kft; √1 lI; √luW; √mA; √1 naS; √nu; √pA; √paS; √pat; √pf; √SAs; √1 siD; √Sf; √1 stu; √1 tan; √1 ukz; √vA; √vap; √vid; √1 vf; √yu; √2 as; √aS;
drdhaval2785 commented 9 years ago

-gā (1), -dā (1), -dhā (1), -pā (1), -mā (1), -vā (1), -hā (1); -ci (1); -lī (1); -tu (1), -nu (1), -yu (1); -kṛ (1), -gṛ (1), -pṛ (1), -vṛ (1), -śṛ (1), -hṛ (1); -j (1), -ṭh (1), -t (2), -d (1), -dh (1), -n (1), -p (1), -v (1), -ś (3), -ṣ (1), -s (3),

| gā |

√gā

| dā |

√dā

| dhā |

√dhā

| pā |

√pā

| mā |

√mā

| vā |

√vā

| hā |

√hā

| ci |

√1 ci

| lī |

√1 lī

| tu |

√1 stu

| nu |

√nu

| yu |

√yu

| kṛ |

√kṛ (skṛ)

| gṛ |

√gṛ

| pṛ |

√pṛ

| vṛ |

√1 vṛ

| śṛ |

√śṛ

| hṛ |

√hṛ

| j |

√1 bhuj

| ṭh |

√luṭh

| t |

√pat

√kṛt

| d |

√vid

| dh |

√1 sidh

| n |

√1 tan

| p |

√vap

| v |

√dhāv

| ś |

√aś

√1 naś

√paś

| ṣ |

√1 ukṣ

| s |

√1 as

√2 as

√śās

drdhaval2785 commented 9 years ago

http://drdhaval2785.github.io/whitneyiast.txt Code at whitney.php sorted in paragraph as suggested and not in columns

gasyoun commented 9 years ago

Input: a; ā; i; ī; u; ū; ṛ; ṝ; ḷ; ḹ; e; ai; o; au; ṃ; ḥ; k; kh; g; gh; ṅ; c; ch; j; jh; ñ; ṭ; ṭh; ḍ; ḍh; ṇ; t; th; d; dh; n; p; ph; b; bh; m; y; r; l; v; ś; ṣ; s; h Output: ñ; ā; ī; ś; ū; a; i; ai; u; au; e; o; k; g; c; j; t; d; n; p; b; m; y; r; l; v; s; kh; gh; ch; jh; th; dh; ph; bh; ḍh; ṭh; ḍ; ḥ; ḷ; ḹ; ṃ; ṅ; ṇ; ṛ; ṝ; ṣ; ṭ

Do you think it is not strange that: e; ai; o; au is now ai; u; au; e? Does not make much logic. au; o; ai; e would make more sense, no?

Please add ñ as equal to ञ्.

Actually not only http://localhost/SanskritSorting/whitney.php now sorts wrong, so does http://localhost/SanskritSorting/reverse21.php

-ñ (1), -ā (1), -ī (1), -ś (1), -ū (1), -a (1), -i (2), -u (2), -e (1), -o (1), -k (1), -g (1), -c (1), -j (1), -t (1), -d (1), -n (1), -p (1), -b (1), -m (1), -y (1), -r (1), -l (1), -v (1), -s (1), -h (11);
-ḍ (1), -ḥ (1), -ḷ (1), -ḹ (1), -ṃ (1);
-ṅ (1), -ṇ (1), -ṛ (1), -ṝ (1), -ṣ (1), -ṭ (1), 

| ñ |
#ñ#

| ā |
#ā#

| ī |
#ī#

| ś |
#ś#

| ū |
#ū#

| a |
#a#

| i |
#i#
#ai#

| u |
#u#
#au#

| e |
#e#

| o |
#o#

| k |
#k#

| g |
#g#

| c |
#c#

| j |
#j#

| t |
#t#

| d |
#d#

| n |
#n#

| p |
#p#

| b |
#b#

| m |
#m#

| y |
#y#

| r |
#r#

| l |
#l#

| v |
#v#

| s |
#s#

| h |
#h#
#kh#
#gh#
#ch#
#jh#
#th#
#dh#
#ph#
#bh#
#ḍh#
#ṭh#

| ḍ |
#ḍ#

| ḥ |
#ḥ#

| ḷ |
#ḷ#

| ḹ |
#ḹ#

| ṃ |
#ṃ#

| ṅ |
#ṅ#

| ṇ |
#ṇ#

| ṛ |
#ṛ#

| ṝ |
#ṝ#

| ṣ |
#ṣ#

| ṭ |
#ṭ#
drdhaval2785 commented 9 years ago

False alarm. User named Marcis misused the code. Entered IAST instead of SLP1. As you sow, so shall you reap.

gasyoun commented 9 years ago

Before: las; vas-1; vas-4; vas-2; śvas-1;

After las; vas-4; vas-2; vas-1; śvas-1;

@drdhaval2785 does not it makes sense?

drdhaval2785 commented 9 years ago

as of now I ignore the numbers. So their order is immaterial for me.

gasyoun commented 9 years ago

@drdhaval2785 you do not ignore them in https://github.com/drdhaval2785/SanskritSorting/issues/42 #arth — 156, 187# format yet.

drdhaval2785 commented 9 years ago

@gasyoun use some other delimiter like - to separate the numbers from word. (— has some issue with the conversion to devanagari)

For single number it would work. for multiple numbers, separated by a comma, it may not work

gasyoun commented 9 years ago

Not to use ok. The commas need replacement with what to work?