drdhaval2785 / SanskritSpellCheck

spell checking based on patterns
1 stars 1 forks source link

Count 100 vowels vs. 130 consonants #5

Closed gasyoun closed 9 years ago

gasyoun commented 10 years ago

Can we take some GRETIL's texts (GRETIL_ALL_2013-10-09_UTF8_FOR_PERSONAL_USE_ONLY.zip) and count how may V are there on 100 occurrences of C in a Sanskrit text? Like 100 vowels vs 130 consonants, please. I managed only Rigveda and even that I'm unsure about. Because I did not use SLP1, that is the reason, only HK, so the results are dirty. What is the real ratio? Sample SLP1 text: https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/meghadhuta-CVC-SLP1.txt

drdhaval2785 commented 10 years ago

Done. Input file : https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/meghadhuta.txt Run https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/countvowels.php

Output -

Occurrence of vowels - 8304 Occurrence of consonants - 11519 Ratio of consonants per 100 vowels -138.71395026793

gasyoun commented 10 years ago

Fantastic job, well done, lovely. Looking for bigger text samples: 13,119.008a sa tatheti pratiśrutya kīṭo vartmany atiṣṭhata 13,119.008b_0599_01 śakaṭavrajaś ca sumahān āgataś ca yadṛcchayā 13,119.008b_0599_02 cakrākrameṇa bhinnaś ca kīṭaḥ prāṇān mumoca ha 13,119.008b*0599_03 saṃbhūtaḥ kṣatriyakule prasādād amitaujasaḥ 13,119.008c tam ṛṣiṃ draṣṭum agamat sarvāsv anyāsu yoniṣu I'll have to try to find a way to weed out the numbers, otherwise abcd at end of numbers might hurt.

drdhaval2785 commented 10 years ago

Bad luck. Regex ahead. Beware

gasyoun commented 10 years ago

Regex finished.

Atharvaveda Occurrence of vowels - 207230 Occurrence of consonants - 275159 Ratio of consonants per 100 vowels -132.7794412502

Meghadhuta Occurrence of vowels - 8304 Occurrence of consonants - 11519 Ratio of consonants per 100 vowels -138.71395026793

Ramayana Occurrence of vowels - 620468 Occurrence of consonants - 853343 Ratio of consonants per 100 vowels -137.5321229039

Mahabharata Occurrence of vowels - 3544615 Occurrence of consonants - 4897052 Ratio of consonants per 100 vowels -138.15

1) Can we get the atharvaveda-CVC-SLP1 part from $file=file_get_contents("atharvaveda-CVC-SLP1.txt"); and add it to Ratio of consonants per 100 vowels (per atharvaveda-CVC-SLP1) -132.7794412502 But 5) would make it non-wanted.

2) Can we have 132.78 instead of 132.7794412502, please?

3) "Occurrence of vowels" -> Occurrence of vowels (V)

4) "Occurrence of consonants" -> Occurrence of consonants (C)

5) $file=file_get_contents for all files in a folder, like http://stackoverflow.com/questions/15041608/searching-all-files-in-folder-for-strings

6) When counting mbh-CVC-SLP1 got stuck, showed only Occurrence of vowels - 3544615 and bellow Fatal error: Allowed memory size of 1048576000 bytes exhausted (tried to allocate 36 bytes) in C:\xampp\htdocs\countvowels.php on line 31

memory_limit=128M initial when changed to memory_limit=-1 did not launch the Apache server. So $split1=preg_split('/([kKgGNcCjJYwWqQRtTdDnpPbBmyrlvSzshMH])/',$file,0,PREG_SPLIT_DELIM_CAPTURE); is crashing it on the MBh file :) After that the counting went on for older files, but for all newer I get only Warning: file_get_contents(m.txt): failed to open stream: No such file or directory in C:\xampp\htdocs\countvowels.php on line 13

ini_set("memory_limit","10000M"); helped me to get Occurrence of vowels - 3544615 Occurrence of consonants - 4897052 so I opened my calculator and got 1.38, tada.

drdhaval2785 commented 10 years ago

It seems more of documentation. Anything which remains for me to do? If you have already done some corrections for your needs - pushing it on github may help. Otherwise I treat this issue as closed.

gasyoun commented 10 years ago

@drdhaval2785 right, documentation is there, but 5) & 2) are wanting. 6) is partly fixed with ini_set("memory_limit","100000M"); so I'll push it for Mahabharata.

drdhaval2785 commented 9 years ago

@gasyoun

Point 2) is done in the latest commit. Sample output with meghaduta-cvc-slp1 file is

Occurrence of vowels - 8304
Occurrence of consonants - 11519
Ratio of consonants per 100 vowels -138.71

Point 5) and 6) are not clear. I will let you close this issue if these two are not that important

gasyoun commented 9 years ago

Let's close it.