HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.18k stars 655 forks source link

Null character breaks regex patterns #10592

Open tobil4sk opened 2 years ago

tobil4sk commented 2 years ago

When a regex pattern created directly using the constructor that contains a null character, it does not match properly on Eval, Neko, C++, lua, and Hashlink. PHP throws an error on the first match call due of the null character (Null byte in regex).

final containingNull = new EReg("abc\x00def", "");

trace(containingNull.match("abc")); // true, should be false
trace(containingNull.match("abc\x00def")); // true
trace(containingNull.match("abc\x00fed")); // true, should be false

On all other targets it works as expected.

On the other hand, the following works fine on (almost) all targets, where we use the regex literal syntax.

final containingNull = ~/abc\x00def/;

trace(containingNull.match("abc")); // false
trace(containingNull.match("abc\x00def")); // true (apart from on hashlink)
trace(containingNull.match("abc\x00fed")); // false

Targets affected:

tobil4sk commented 2 years ago

The Php issue is a Php bug so I think it's fine to leave it. There is already way to avoid the bug by escaping it or by using Haxe's regex literal syntax, which escapes it automatically.

On Hashlink it is a little bit more complicated however, as according to this page: https://haxe.org/manual/std-String-encoding.html, Hashlink does not support null bytes, so I'm not sure whether it makes sense or not to fix this there. Also, it would potentially require a change of the Hashlink api to pass in the length of the pattern into the constructor.