Closed gasyoun closed 9 years ago
Done. Input file : https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/meghadhuta.txt Run https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/countvowels.php
Output -
Occurrence of vowels - 8304 Occurrence of consonants - 11519 Ratio of consonants per 100 vowels -138.71395026793
Fantastic job, well done, lovely. Looking for bigger text samples: 13,119.008a sa tatheti pratiśrutya kīṭo vartmany atiṣṭhata 13,119.008b_0599_01 śakaṭavrajaś ca sumahān āgataś ca yadṛcchayā 13,119.008b_0599_02 cakrākrameṇa bhinnaś ca kīṭaḥ prāṇān mumoca ha 13,119.008b*0599_03 saṃbhūtaḥ kṣatriyakule prasādād amitaujasaḥ 13,119.008c tam ṛṣiṃ draṣṭum agamat sarvāsv anyāsu yoniṣu I'll have to try to find a way to weed out the numbers, otherwise abcd at end of numbers might hurt.
Bad luck. Regex ahead. Beware
Regex finished.
Atharvaveda
Occurrence of vowels - 207230
Occurrence of consonants - 275159
Ratio of consonants per 100 vowels -132.7794412502
Meghadhuta
Occurrence of vowels - 8304
Occurrence of consonants - 11519
Ratio of consonants per 100 vowels -138.71395026793
Ramayana
Occurrence of vowels - 620468
Occurrence of consonants - 853343
Ratio of consonants per 100 vowels -137.5321229039
Mahabharata
Occurrence of vowels - 3544615
Occurrence of consonants - 4897052
Ratio of consonants per 100 vowels -138.15
1) Can we get the atharvaveda-CVC-SLP1
part from $file=file_get_contents("atharvaveda-CVC-SLP1.txt");
and add it to Ratio of consonants per 100 vowels (per atharvaveda-CVC-SLP1
) -132.7794412502 But 5) would make it non-wanted.
2) Can we have 132.78
instead of 132.7794412502
, please?
3) "Occurrence of vowels" -> Occurrence of vowels (V)
4) "Occurrence of consonants" -> Occurrence of consonants (C)
5) $file=file_get_contents
for all files in a folder, like http://stackoverflow.com/questions/15041608/searching-all-files-in-folder-for-strings
6) When counting mbh-CVC-SLP1 got stuck, showed only Occurrence of vowels - 3544615
and bellow Fatal error: Allowed memory size of 1048576000 bytes exhausted (tried to allocate 36 bytes) in C:\xampp\htdocs\countvowels.php on line 31
memory_limit=128M
initial when changed to memory_limit=-1
did not launch the Apache server. So $split1=preg_split('/([kKgGNcCjJYwWqQRtTdDnpPbBmyrlvSzshMH])/',$file,0,PREG_SPLIT_DELIM_CAPTURE);
is crashing it on the MBh file :) After that the counting went on for older files, but for all newer I get only Warning: file_get_contents(m.txt): failed to open stream: No such file or directory in C:\xampp\htdocs\countvowels.php on line 13
ini_set("memory_limit","10000M");
helped me to get
Occurrence of vowels - 3544615
Occurrence of consonants - 4897052
so I opened my calculator and got 1.38, tada.
It seems more of documentation. Anything which remains for me to do? If you have already done some corrections for your needs - pushing it on github may help. Otherwise I treat this issue as closed.
@drdhaval2785 right, documentation is there, but
5)
& 2)
are wanting.
6)
is partly fixed with ini_set("memory_limit","100000M");
so I'll push it for Mahabharata.
@gasyoun
Point 2) is done in the latest commit. Sample output with meghaduta-cvc-slp1 file is
Occurrence of vowels - 8304
Occurrence of consonants - 11519
Ratio of consonants per 100 vowels -138.71
Point 5) and 6) are not clear. I will let you close this issue if these two are not that important
Let's close it.
Can we take some GRETIL's texts (GRETIL_ALL_2013-10-09_UTF8_FOR_PERSONAL_USE_ONLY.zip) and count how may V are there on 100 occurrences of C in a Sanskrit text? Like 100 vowels vs 130 consonants, please. I managed only Rigveda and even that I'm unsure about. Because I did not use SLP1, that is the reason, only HK, so the results are dirty. What is the real ratio? Sample SLP1 text: https://github.com/drdhaval2785/SanskritSpellCheck/blob/master/meghadhuta-CVC-SLP1.txt