drdhaval2785 / SanskritSorting

Codes written by Dr. Dhaval Patel for Sanskrit Natural Language Programming
2 stars 1 forks source link

SLP1 input with IAST output #2

Closed gasyoun closed 9 years ago

gasyoun commented 9 years ago

Devanagari is not the format for data storage ot Cologne and never will be. SLP1 is a good scheme for doing NLP and coding tasks. So of course one can dream of a sorter that accepts SLP1, as in the case of devanagari we could make it somehow with Chattoraj's code. Output could be SLP1 or IAST. From a clean IAST with accents I can get a good devanagari anyway, not the other way around usually. IAST output samples (full SCH list): híṃsa karī̀ra naí̱viḍya See my accent documentation based on Jim's notes.

Shalu411 commented 9 years ago

Namaste Devanagari is really worth, if one could separate the letter from its mAtras.. and combined forms. Unicode Devanagari allows not to see a letter in its separate constituents.. like Eg. once we type anusvAra, it becomes part of that letter it is attached with- and we can never- for eg. search it separately.. whereas in Roman characters it is possible. But in some editors like gmail, and sometimes some places, I saw Devanagari letters coming separately when erased, (using del or backspace in specific cases).. so may be if we could find a way of identifying that characters each mAtrA separately (do not know if its possible), then we can replace it for IAST or SLP etc. Only advantage of SLP is its single latter for each Devanagari character. But its difficult for others i.e. non-users if that code to read that word to make some sense. Eg. kaNTha will be kaRWa or something.. so there are limitations of deciphering without key! So if its possible to exclusively identify Devanagari characters like VCCV combinations, then may be issues will be solvable? Impossible? Bad? Far away idea?

gasyoun commented 9 years ago

All your concerns are easily solved with devanagari output as an option. For data storage devanagari is bad.

Shalu411 commented 9 years ago

"For data storage devanagari is bad." Does "Data storage" exclude coding or include it? Where are corrections done? in coding, stored data or in out put? Please make it clear. I think everywhere, reading non-Devanagari text is difficult.

gasyoun commented 9 years ago

Reading SLP1 can be harder, but one knows nothing is lost. Corrections are done lately only in SLP, it's the coding in the stored data. Output is only the representation of the stored data. It can use Tibetan alphabet if you want for Sanskrit words.

Shalu411 commented 9 years ago

"one knows nothing is lost." What would be exactly lost if it were Devanagari? Example? I see it this way- For "vadana" I have to write/type 6 letters in SLP... whereas I need only three in Dev. -वदन (=face)

"Output is only the representation of the stored data." Then what's wrong? When output=stored=input data, why can't be input done in Dev.? No conversion needed. What is exact advantage of SLP or IAST or HK over Devanagari? Asking in order to understand the process exactly.

"Tibetan alphabet" Bad idea. All these standards are already enough for confusion.

drdhaval2785 commented 9 years ago

Lets stop digressing. Shalu, let me clear the thing for you. In coding - SLP1 has distinct advantage. e.g. HK - dhavala - maybe धवल / द्‍हवल. In SLP1 it is always unique. धवल - Davala. द्‍हवल - dhavala. Problem with storing data in devanagari is - the coding languages handle the devanagari letters as foreign. So, the inbuilt functions in most of the coding platforms do not handle them well. So we lose the power of Coding languages. Setting the file as utf-8 helps, but not always.

drdhaval2785 commented 9 years ago

My question :

  1. Do we keep SLP1 as default input ?
  2. Do we keep IAST as default output ? Of course, it can be altered in code manually. What do you want as default?
gasyoun commented 9 years ago
  1. Yes. SLP1
  2. Yes. IAST.
drdhaval2785 commented 9 years ago

reverse18.php can handle SLP1 as input.

2nd point pending

gasyoun commented 9 years ago

What a day, what a day. Dhaval, the magician.

drdhaval2785 commented 9 years ago

2nd point done. Check reverse19.php. It takes slp1 as input. Gives two outputs -

  1. devanagarisorted.txt -> in devanagari sorted dictionary
  2. devanagarisorted1.html -> in IAST with highlight on pratyaya ends.