I suggest you to use index() could be used to find positions of keywords including phrases.
library(quanteda.proximity)
library(quanteda)
#> Package version: 4.0.0
#> Unicode version: 13.0
#> ICU version: 69.1
#> Parallel computing: 16 of 16 threads used.
#> See https://quanteda.io for tutorials and examples.
txt <-
c("Turkish President Tayyip Erdogan, in his strongest comments yet on the Gaza conflict, said on Wednesday the Palestinian militant group Hamas was not a terrorist organisation but a liberation group fighting to protect Palestinian lands.",
"EU policymakers proposed the new agency in 2021 to stop financial firms from aiding criminals and terrorists. Brussels has so far relied on national regulators with no EU authority to stop money laundering and terrorist financing running into billions of euros.")
toks <- tokens(txt)
len <- ntoken(toks)
idx <- index(toks, pattern = phrase("Tayyip Erdogan"))
pmin(abs(seq_len(len[idx$docname]) - idx$from), abs(seq_len(len[idx$docname]) - idx$to))
#> [1] 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#> [26] 22 23 24 25 26 27 28 29 30 31 32 33 34
More generally, patters2fixed() can be used to parse patters in the same way as in quanteda.
I suggest you to use
index()
could be used to find positions of keywords including phrases.More generally,
patters2fixed()
can be used to parse patters in the same way as in quanteda.https://github.com/gesistsa/quanteda.proximity/blob/dbd414cc7d52d389105f2b8b997c1af912ead4f9/R/get_dist.R#L28-L39