gesistsa / tokenvars

🔬 Add token-level metadata to `quanteda` (An Experiment)
GNU General Public License v3.0
0 stars 0 forks source link

Own class: `tokens_with_tokenvars` #2

Open chainsawriot opened 1 year ago

chainsawriot commented 1 year ago

Given gesistsa/quanteda.proximity#35 and quanteda::tokens_*() will not respect tokenvars, it would be better to make this a new class for now.

chainsawriot commented 1 year ago

Although tokens_with_tokenvars_VERB() is annoying to type (if we are going to follow the style guide); but this is an experiment anyway.

chainsawriot commented 1 year ago
xtokenid <- c("t1", "t2")
xtoken <- c("spacy", "is")
xtokenvars <- data.frame(tag = c("NNP", "VBZ"), lemma = c("spaCy", "be"))

mockup <- function(xtokenid, xtoken, xtokenvars) {
    ugly <- vapply(seq_len(nrow(xtokenvars)), function (y) paste(as.character(xtokenvars[y,]), collapse = "/"), "a")
    cat("Tokens (with token variables) consisting 2 documents.\n")
    cat("Token variables: (", paste(names(xtokenvars), collapse = "/"), ").\n", sep = "")
    cat("d1:\n")
    for (i in seq_along(xtoken)) {
        cat("[", xtokenid[i], "]: ", xtoken[i], " (", ugly[i], ") ", sep = "")
    }
    cat("\n")
}

mockup(xtokenid, xtoken, xtokenvars)
#> Tokens (with token variables) consisting 2 documents.
#> Token variables: (tag/lemma).
#> d1:
#> [t1]: spacy (NNP/spaCy) [t2]: is (VBZ/be)

Created on 2023-11-26 with reprex v2.0.2