-
Preprocessing currently takes a long time for large datasets. One way to improve the speed is to use [Spacy pipes](https://spacy.io/usage/processing-pipelines), particularly for lemmatization. Preproc…
-
Task: Perform initial data cleaning, including handling missing values (if any), normalizing text (lowercasing, removing punctuation, etc.), and preliminary data exploration (e.g., distribution of cla…
-
I've tried `segmentation-lemma-tagging/run_inf.py` with various modes on the following sentence:
> āsīdaśeṣanarapatiśiraḥsamarcitaśāsanaḥ pākaśāsana ivāparacaturudadhimālāmekhalāyā bhuvo bhartā pra…
-
The service must support non-Western languages and scripts, including, but not limited to:
* [ ] arabic
* [ ] chinese
* [ ] cyrillic
-
Now that we have our preprocessing (lemmatization, punctuation removal, etc) complete, we need to preprocess all of our input data. The code for this is simply `data['reviewText'] = data['reviewText']…
-
When the sentence tokens are lemmatized, they are lemmatized with the postag and the token. However, when the list of keywords is lemmatized, a postag is not available. Sometimes the two do not matc…
-
Hi, nice project. Actually I began to do something similar when I encounter your project and doubt whether continue now on my own or join forces with you. Can you please explain how do I suppose to tr…
-
I've posted this feature request to ASBplayer's GitHub as well. [Link to open issue](https://github.com/killergerbah/asbplayer/issues/428)
**Is your feature request related to a problem? Please des…
-
Hi,
I'm the author of [SSM](https://github.com/FreeLanguageTools/ssmtool) which is a language learning utility for quickly making vocabulary flashcards. Thanks for this project! Without this it would…
-
**Background**
- The literature seems unclear on what similarity metrics perform best for diversity and relevancy. (if anyone has found any good analysis on this would be great to see).
- bm25 wor…