-
"contact the me"
-
Seems that there are two of the same slide, at least in the copy I have downloaded today.
-
I believe it should be "wide range _of_ tasks".
-
Djokovic is misspelt. This was pointed out in class but wished to add here as a reminder, don't worry about points for this one :P
-
The sentence "A word may be broken into separate tokens if it meaningful makes sense" seems to miss out "A word may be broken into separate tokens if it meaningful**ly** makes sense" on it.
See att…
-
## 2.1 Preprocess the data by removing stopwords, punctuation, and non-alpha words (5 points)
A. Write a function that:
- Takes in a single raw string in the `contents` column from that datafram…
-
## 2.2 Create a document-term matrix from the preprocessed press releases and to explore top words (5 points)
A. Use the `create_dtm` function I provide (alternately, feel free to write your own!) …
-
## 2.4 Add topics back to main data and explore correlation between manual labels and our estimated topics (10 points)
A. Extract the document-level topic probabilities. Within `get_document_topics…
-
## 1. NLP on one press release (10 points)
Focus on the following press release: `id` == "17-1204" about this pharmaceutical kickback prosecution: https://www.forbes.com/sites/michelatindera/2017/1…
-
Namely, this one https://github.com/textasdata/textasdata.github.io/issues/6