-
I noticed that the documentation of `docid()` says that the returned value is a character vector. That's a factor really. I couldn't do some operations and now I understand why.
Can you fix the doc…
-
## Describe the bug
` convert(., to = "stm")` correctly drops empty documents, but the warning message suggests that all documents are dropped.
## Reproducible code
```r
# documents with one…
-
In the general workflow, DBpedia Spotlight returns the types of the entities. These are stored in a column `types` which contains a list of lists for each entity. This column can be resolved to indivi…
-
This is fine:
```r
> library("spacyr")
> sp
-
**Description**
I would like to cite papers from JSTOR
**Similar Features**
Similar to NCBI and other sources.
**Feature Details**
Full Harvard citation as required.
**Proposed Im…
-
The current implementation can't create ngrams from more than 0.09% of the corpora.
This is because 1.5Gb of memory allocation is required to build the tokenize the 4gram using tm.
-
These would include:
* (vocd-)D
* HD-D
See McCarthy, Philip M, and Scott Jarvis. 2010. “MTLD, Vocd-D, and HD-D: a Validation Study of Sophisticated Approaches to Lexical Diversity Assessment.” _B…
-
First of all: thanks for this great package\! Since `RTextTools` was recently removed from CRAN I was trying to find a good solution for SML on text data in `R` and was a bit frustrated by `caret` whi…
-
This comes from https://github.com/quanteda/quanteda.sentiment/issues/11, which is a more general question about how a function can return the set of original tokens matching a dictionary lookup, not …
-
I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker), and, sometim…