AI4Bharat / indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages
https://ai4bharat.github.io/indicnlp_catalog
543 stars 78 forks source link

Various Sanskrit text sources #107

Open anoopkunchukuttan opened 3 years ago

anoopkunchukuttan commented 3 years ago

https://www.hinduscriptures.in/ (contains translations, requires OCR) https://upanishads.org.in/ (contains translations) https://www.wisdomlib.org/ (scattered) https://sa.wikisource.org/ (source shlock’s only) http://www.cc.kyoto-su.ac.jp/~yanom/sanskrit/ (transliterated) http://www.sanskrit-linguistics.org/dcs/index.php (contains annotations) http://gretil.sub.uni-goettingen.de/gretil.html (transliterated) https://www.valmiki.iitk.ac.in/ (translations and commentaries) https://www.upanishads.iitk.ac.in/ (translations and commentaries) http://sanskrit.jnu.ac.in/index.jsp https://www.gitasupersite.iitk.ac.in/ http://sanskrit.uohyd.ac.in/Corpus/ http://vedicheritage.gov.in/ http://oaks.nvg.org/ http://oaks.nvg.org/pega12.html

Thanks to @rahular for pointing out many of these sources

rahular commented 3 years ago

Here are some links to already scrapped, ready to use data:

rahular commented 3 years ago

Govt. of India has digitized translations of various texts: http://vedicheritage.gov.in/flipbook/

Edit: Just saw this is already present in the list above. For my personal reading, I have written some code to crawl this site. I can give access to it, if required.