exaloop / codon

A high-performance, zero-overhead, extensible Python compiler using LLVM
https://docs.exaloop.io/codon
Other
15.01k stars 517 forks source link

re.compile() cannot identify operator "(?<" #254

Open xiaxinmeng opened 1 year ago

xiaxinmeng commented 1 year ago

In the following example, re.compile fails to match "(?<"

test.py:

import re
p=re.compile('(?<!abc)(def)')
p.search('abcddef') 

Error message:

error: invalid perl operator: (?<

Raised from: std.re.compile:0
/home/xxm/Desktop/IFuzzer/experiment_on_different_interpreter/codon/codon-linux-x86_64/codon-deploy/lib/codon/stdlib/re.codon:104:9
Aborted (core dumped)

Reproduce: 'codon/codon-linux-x86_64/codon-deploy/bin/codon' run -release test.py

Behavior on CPython 3.10.8: work well Environment: codon: v0.15.5 on Feb 6 Ubuntu 18.04

xiaxinmeng commented 1 year ago

"\" also fails. For example:

import re
match = re.compile('((a)|b\\2)*')

output:

error: invalid escape sequence: \2

Raised from: std.re.compile:0
/home/xxm/Desktop/IFuzzer/experiment_on_different_interpreter/codon/codon-linux-x86_64/codon-deploy/lib/codon/stdlib/re.codon:104:9
Aborted (core dumped)
arshajii commented 1 year ago

This is because Codon uses Google's re2 under the hood for regular expressions, which is faster but has some limitations. What we want to do at some point is also incorporate Python's regex engine and choose at compile time whether to go with Python or re2 (since the regex is normally known at compile time anyway).