Own class: `tokens_with_tokenvars`

chainsawriot commented 1 year ago

Given gesistsa/quanteda.proximity#35 and quanteda::tokens_*() will not respect tokenvars, it would be better to make this a new class for now.

[x] Create a new class tokens_with_tokenvars
[x] tokens_with_tokenvars.as.tokens()
[x] docvars.tokens_with_tokenvars()
[ ] meta.tokens_with_tokenvars()

chainsawriot commented 1 year ago

Although tokens_with_tokenvars_VERB() is annoying to type (if we are going to follow the style guide); but this is an experiment anyway.

chainsawriot commented 1 year ago

[x] print.tokens_with_tokenvars()

xtokenid <- c("t1", "t2")
xtoken <- c("spacy", "is")
xtokenvars <- data.frame(tag = c("NNP", "VBZ"), lemma = c("spaCy", "be"))

mockup <- function(xtokenid, xtoken, xtokenvars) {
    ugly <- vapply(seq_len(nrow(xtokenvars)), function (y) paste(as.character(xtokenvars[y,]), collapse = "/"), "a")
    cat("Tokens (with token variables) consisting 2 documents.\n")
    cat("Token variables: (", paste(names(xtokenvars), collapse = "/"), ").\n", sep = "")
    cat("d1:\n")
    for (i in seq_along(xtoken)) {
        cat("[", xtokenid[i], "]: ", xtoken[i], " (", ugly[i], ") ", sep = "")
    }
    cat("\n")
}

mockup(xtokenid, xtoken, xtokenvars)
#> Tokens (with token variables) consisting 2 documents.
#> Token variables: (tag/lemma).
#> d1:
#> [t1]: spacy (NNP/spaCy) [t2]: is (VBZ/be)

^{Created on 2023-11-26 with reprex v2.0.2}

gesistsa / tokenvars

Own class: `tokens_with_tokenvars` #2