drdhaval2785 / siddhantakaumudi

siddhAntakaumudI .txt, .xml, .html, .epub files - Periodically updated and corrected.
Other
13 stars 4 forks source link

Use ngrams for error identification in sk.xml #3

Open drdhaval2785 opened 7 years ago

drdhaval2785 commented 7 years ago

Use bigrams and trigrams of 'kAzikA', 'bAlamanoramA' or other grammar text as basis and find out odd bigrams and trigrams for corrections.

gasyoun commented 7 years ago

bigrams and trigrams for corrections

Text is that dirty? What is the .pdf for comparison?

drdhaval2785 commented 7 years ago

No idea what is the base of this digitization. Seems like 1911 edition of SK with bAlamanoramA from Trichinopoly.

drdhaval2785 commented 7 years ago

https://github.com/drdhaval2785/siddhantakaumudi/blob/master/sk_2gram_suspect.txt https://github.com/drdhaval2785/siddhantakaumudi/blob/master/sk_3gram_suspect.txt

Odd line has basic text Even line has abnormal words and ngrams.

drdhaval2785 commented 7 years ago

2grams checking over.3 grams started.

gasyoun commented 3 years ago

3 grams started.

Finalised?