Adding a text file having Turkish stop words and an option for removing them during text preprocessing can be useful. It would also benefit sadedegel by making it closer to state of art NLP libraries for English.
I found a work that was done on Turkish stop words (on this link https://github.com/ahmetax/trstop). We can use the text file with Turkish stop words there. Then by using the list of stopwords we can make changes in the code for giving the user an option for possibly removing them during the preprocessing stage.
Closing this issue as it was already implemented by:
Adding a text file of Turkish stop words (stop-words.txt) under "sadedegel/bblock/data/".
Implementing a method for loading the stop words as load_stopwords() under "sadedegel/bblock/util.py".
Implementing a method for checking stop word status of the Token object by is_stopword() property of the Token object under "sadedegel/bblock/token.py".
Using this property of Token object throughout "sadedegel/bblock/token.py" and "sadedegel/bblock/doc.py".
Adding a text file having Turkish stop words and an option for removing them during text preprocessing can be useful. It would also benefit sadedegel by making it closer to state of art NLP libraries for English.
I found a work that was done on Turkish stop words (on this link https://github.com/ahmetax/trstop). We can use the text file with Turkish stop words there. Then by using the list of stopwords we can make changes in the code for giving the user an option for possibly removing them during the preprocessing stage.