Open gasyoun opened 8 years ago
@drdhaval2785 are there any new lessons learned in this regard? Please let me know.
@drdhaval2785 it's dead, I understand. But should it be left so?
I have felt need for it in a current project. So hopefully it will not remain dead for long.
So hopefully it will not remain dead for long.
Please take a look at https://github.com/funderburkjim/MWderivations/issues/14 I need to present a paper on it and need help on this 4000 word list, thanks.
python split.py batchprocess/input.txt MW batchprocess/output.txt
From code, it looks that this should work.
First of all I wanted to salute you with https://github.com/drdhaval2785/samasasplitter. How quick, but even now it can handle sandhi! Huet's and @mbykov have done a lot in the field lately and hope will comment. @funderburkjim is out of the game, but not reason not to listen what he thinks of it.
80k word frequency included in https://github.com/gasyoun/SanskritLexicography/blob/0fb80a8de652e80eb5514d930289c0cc0588d85b/DCS_statistical_evaluation.htm It's parsed from http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=corpus and contains part of MW. Of less interest but still might be https://github.com/gasyoun/SanskritLexicography/blob/0fb80a8de652e80eb5514d930289c0cc0588d85b/DCS-Moniers-roots-w-references.html Please also see https://docs.google.com/document/d/11Z1snnew9a0eY96W5o-ZQ71Zve1WRjcOqfOFgagndy4/edit#heading=h.k0dxemsx30hk - questions I had after reading Gérard's emails:
Gérard Huet 08.02.14:
Gérard Huet 21.03.14:
@drdhaval2785 I would go for: 3.1 Frequency 3.2 sanhw2 occurance 3.3 Word length (DEC) 3.4 Alphabetic order