PolMine / RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' (CWB)
Other
2 stars 3 forks source link

subcorpus pointer #53

Closed ablaette closed 2 years ago

ablaette commented 2 years ago

Don't know whether this is already a feature request. This is the sketch of an idea how to use the CWB internal representation of a subcorpus.

library(RcppCWB)

p <- cqp_query("REUTERS", query = '<id>[]*</id>::match.id="127"')
q <- RcppCWB:::.cqp_subcorpus_query(p, "REUTERS:FOO", 'FOO = "crude" "oil";')
cpos <- RcppCWB:::.cqp_subcorpus_regions(q)

library(polmineR)
corpus("REUTERS") %>%
  subset(id == "127") %>%
  cpos(query = '"crude" "oil"', cqp = TRUE)
ablaette commented 2 years ago

Note that this works, but .cqp_subcorpus_query() is not yet exported. This is not a high-priority issue, but I should like to keep it open, to pick up the idea later on.

ablaette commented 2 years ago

More relevant than I had thought, but best way so far is to integrate it into cqp_query():

library(RcppCWB)

q <- cqp_query("GERMAPARL", subcorpus = "CONTEXT", query = '"Integration" []{3}')
m <- RcppCWB:::.cqp_subcorpus_regions(q)

gparl <- cl_find_corpus("GERMAPARL", registry = Sys.getenv("CORPUS_REGISTRY"))
p <- RcppCWB:::.cqp_subcorpus(name = "FOO2", corpus = gparl, region_matrix = m)
cqp_list_subcorpora("GERMAPARL")

x <- cqp_query("GERMAPARL:FOO2", query = '"gescheitert";', subcorpus = "TTT")
cqp_list_subcorpora("GERMAPARL")
cqp_dump_subcorpus("GERMAPARL", "TTT")
RcppCWB:::.cqp_subcorpus_regions(x)
RcppCWB:::.cqp_drop_subcorpus("GERMAPARL:CONTEXT")