Open kecookier opened 2 weeks ago
thanks @kecookier do you will help fix the issue ?
thanks @kecookier do you will help fix the issue ?
Yes, I will work on this issue.
@kecookier I also encountered this issue, the reason is differences in handling \\
--- Query
with tb as (
select
name
FROM Values
('12234') people (name)
)
select regexp_replace(name, '22', '\\|') c1, 'gluten-spark' c2 from (select /*+ repartition(2) */ * from tb)
union
select regexp_replace(name, '22', '\\|') c1, 'vanilla-spark' c2 from tb
-- Result
c1 c2
1|34 vanilla-spark
134 gluten-spark
1. code
#include <iostream>
#include <re2/re2.h>
int main() {
std::string input = "12234";
RE2 pattern("22");
std::string replacement = "\\|";
int replacements = RE2::GlobalReplace(&input, pattern, replacement);
std::cout << "Modified string: " << input << std::endl;
return 0;
}
2. output:
re2/re2.cc:1059: invalid rewrite pattern: \|
Modified string: 134
3. compile:
g++ -o example example.cpp -L/usr/local/lib -lre2 -Wl,-rpath,/usr/local/lib
1. code
import java.util.regex.Matcher
import java.util.regex.Pattern
val input = "12234"
val pattern: Pattern = Pattern.compile("22")
val replacement = "\\|"
val matcher: Matcher = pattern.matcher(input)
val result = matcher.replaceAll(replacement)
println(s"Modified string: $result")
2. output:
Modified string: 1|34
@wang-zhun Sorry for the late reply, and thank you for the information. I have a work-in-progress PR that will fix this issue.
Posting code where re2 error happens for replacement string with "\\": https://github.com/google/re2/blob/6dcd83d60f7944926bfd308cc13979fc53dd69ca/re2/re2.cc#L1053.
While java.util.regex.Matcher views characters after "\\" as literal, i.e., using what it is after "\\".
Can we just revise the replacement string to let Re2 do the correct replacement?
Backend
VL (Velox)
Bug description
Spark version
Spark-3.2.x
Spark configurations
No response
System information
Velox System Info v0.0.2 Commit: 3a459abad5af1a8cc268c349bc3813249511768c CMake Version: 3.22.0 System: Linux-3.10.0-862.mt20190308.130.el7.x86_64 Arch: x86_64 CPU Name: Model name: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz C++ Compiler: /opt/rh/devtoolset-9/root/usr/bin/c++ C++ Compiler Version: 9.3.1 C Compiler: /opt/rh/devtoolset-9/root/usr/bin/cc C Compiler Version: 9.3.1 CMake Prefix Path: /usr/local;/usr;/;/usr/local/cmake;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response