R 4.3, arrow 14.0.0.2 (most recent Mac OS binary; apologies in advance if this is already supported in source)
arrow can't handle perl operators, such as negative lookaheads, in regular expressions, at least via dplyr and stringr:
library(arrow)
library(dplyr)
library(stringr)
ar = data.frame(text = c('Lorem ipsum dolor sit amet',
'Lorem dolor ipsum sit amet')) |>
as_arrow_table()
## Works, returns both rows
ar |>
filter(str_detect(text, 'Lorem [^(ipsum)]')) |>
collect()
## Should only return the second row
## Error in `compute.arrow_dplyr_query()`:
## ! Invalid: Invalid regular expression: invalid perl operator: (?!
ar |>
filter(str_detect(text, regex('Lorem(?! ipsum)')))
collect()
Without looking at the code, so not a definitive answer, but I am pretty sure that re2 the C++ library used in acero doesn't support lookahead so this is probably not something that can be added.
Describe the enhancement requested
R 4.3, arrow 14.0.0.2 (most recent Mac OS binary; apologies in advance if this is already supported in source)
arrow
can't handle perl operators, such as negative lookaheads, in regular expressions, at least viadplyr
andstringr
:Component(s)
R