Open rtbs-dev opened 4 months ago
I don't believe polars is making use of arrow-rs, in favour of its own implementation.
That being said, if you're wanting to do advanced regex have you considered just iterating the arrays manually and applying the regex?
Edit: I've filed https://github.com/apache/arrow-rs/issues/5991 to track this further
What are you trying to do? I would like to extract
regex::Captures
structs directly, rather than the already-unwrapped string values, because I require the byte offsets directly (e.g. to implement ISO 24612, which requires primacy of string span locations in a document, not the contents themselves).Describe the solution you'd like
Either a new function to extract the
Captures
structs directly, or a mode forcompute::regexp_match
that provides the offset anchors for each match.Describe alternatives you've considered
Retrieving the strings and then trying to find their locations one-by-one is wasteful of resources, and I can't find a flag to enable the desired behavior :)
Additional context
Coming here from polars#16341, but if I'm understanding their codebase correctly, they are using this backend as an intermediary to the
regex
rust lib.