Kalebu / kamusi

JSON and CSV data for Swahili dictionary with over 16600+ words
MIT License
15 stars 4 forks source link
kamusi nlp-data swahili-dictionary swahili-nlp-data tanzania
# [kamusi](https://github.com/Kalebu/kamusi) JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from **swahili dictionary** with about 16683 words together with their meaning, synonyms and conjugations. This repo couldn't exist without [Kamusi-Mobile](https://github.com/jacksiro254/Kamusi-Mobile/), Thanks to great effort done by [Jack Siro](https://github.com/jacksiro254). ## So how this data was generated ? This data is result of webscraping done to [kamusi](http://kamusi.appsmata.com) with help of selenium and BeautifulSoup. There are basically two scripts, one for scraping **app.py** while the other one **to_json** serves a purpose of converting scrapped CSV data into json that can easily be used by others. ## Gathering data I'm currently gathering and organizing swahili data mainly for doing NLP purposes, if you now any other places that we can scrap useful data in swahili please raise an issue for it. Looking forward to see what you're going to build with it. ## Give it a star Was this useful to you ? Then give it a star so that more people can make use of this. ## Credits All the credits to: - [kalebu](https://github.com/kalebu) - [Jack Siro](https://github.com/jacksiro254) - and all the contributors