asg017 / sqlite-regex

A fast regular expression SQLite extension, written in Rust
Apache License 2.0
166 stars 7 forks source link

Add new `regex_captures_template` virtual table? #9

Open asg017 opened 1 year ago

asg017 commented 1 year ago

The new regex_capture() and regex_captures() functions are great, but often you have to call regex_capture() multiple times to extract different capture group from the same capture.

For example:

select
  regex_capture("('ID(?P<id>\d+)Y(?P<year>\d+)ABC')", "ID123Y2023ABC", "id") as id,
  regex_capture("('ID(?P<id>\d+)Y(?P<year>\d+)ABC')", "ID123Y2023ABC", "year") as year;

The regex_captures make this a bit better, but it's still awkward and you repeat yourself a lot.

Possible solution: A "template" virtual table

A new virtual table that can take in a regex pattern as input, and infer column names based on the pattern's capture group names.

For example:

create virtual table parse_id using regex_captures_template(
  'ID(?P<id>\d+)Y(?P<year>\d+)ABC'
);

Would create a parse_id table function with id and year as columns. Then it can be used like so:

select id, year from parse_id('ID123Y2023ABC');

select 
  items.rowid, 
  id, 
  year 
from items join parse_id(items.code);

This is the first "template" virtual table that I have seen, besides maybe the define function (but isn't arbitrary code execution).

Also not sure what to do with capture groups without names. Maybe c0, c1 as column names for those? What about if a capture group has c0 as a name? Probably not too terrible tbh...