-
#### Learning Goals
- Word embeddings
- Stemming v/s Lemmatisation
- TF-IDF Algorithm
- Sentiment Classification
### Exercise Statement
Use the nltk and scikit-learn libraries to learn basi…
-
- [x] Create a set of data cleaning methods
- [x] Set to lowercase
- [x] Change `á é í ó ú` -> `aeiou` and `ñ` -> `gn`
- [x] Remove Emojis
- [x] Remove mentions
- [x] Remove hashta…
-
-
```
Integrate AraNLP.
---
https://sites.google.com/site/mahajalthobaiti/resources
AraNLP library is a Java-based toolkit for the processing of Arabic text. It
supports the most important preproces…
-
Currently, TC's text capabilities are limited to using logistic regression on top of BOW encoded text.
While this is suitable for some cases, many use cases require more sophisticated/modern NLP me…
-
E.g. "0 to 1" and "zero to one" should be equivalent in the nlp machinery. Some places I use written out numerals and some
Probably fine to only handle up to 100 (as well as thousand, million, hun…
-
- [ ] Distress score prediction
- [ ] Topic modelling
- unsupervised clustering, then manually tag based on keyword frequencies
- or use i-carol tagged categories
- [ ] Sentiment analysis …
-
**Is your feature request related to a problem? Please describe.**
Pluralizing english words is useful for many things:
- Map more easily codebase with collections, database tables, etc.
- e.g.…
-
I assume this is still something that would be useful in this implementation, having stop word filtering and stemming would be useful and this abstraction works well in the [JavaScript implementation]…
-
Hey man, sorry to open an issue here, but I saw your commit on the [spacy repo](https://github.com/explosion/spaCy/commit/1a00bff06d7e9632fc5a647265cf70acaea73a6d)
I was trying to use spacy to do s…