firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.21k stars 198 forks source link

Negative lookahead incorrectly excludes the latest matching character #2023

Closed naftalmm closed 1 year ago

naftalmm commented 1 year ago

Bug Description

Match information is not correct in case if test string contains character specified in negative lookahead. Reproducible on all flavours.

Reproduction steps

Regexp: ^0+(?!\$) Test string: 000$ Match: 00

Expected Outcome

Match: 000

Browser

Chrome 109.0.5414.120

OS

Win7

working-name commented 1 year ago

@naftalmm try \$ inside the negative lookahead if you want to match the dollar sign character. $ by itself means end of input or end of string if you have /m flag turned on.

naftalmm commented 1 year ago

Yes, sorry, that's what I've meant originally. Anyway, if we use $ by itself, it's also reproducible with another input:

Regexp: ^0+(?!$) Test string: 000 Match: 00

Expected Outcome Match: 000

working-name commented 1 year ago

Yes, $ by itself behaves as expected because it's a metacharacter - it has special meaning for the regex engine. If you want the regex engine to consider it just a dollar sign, you need to escape it by prepending a \ in front of it, like so \$.

In other words, this is not a bug, it's a feature of the regex engine. There are plenty more metacharacters, among which: .,^,+,*,?...and so on.

I hope this makes sense.

naftalmm commented 1 year ago

Yes, I'm aware of regex metacharacters and how they work. OK, forget about $, it's reproducible with ANY character. For instance, a:

Regexp: 0+(?!a) Test string: 000a Match: 00

Expected Outcome Match: 000

https://regex101.com/r/dtqQ5v/1

working-name commented 1 year ago

My bad, I misunderstood. If you open up the debugger you'll see what the engine does and why it would stop at the second 0 rather than the third.

naftalmm commented 1 year ago

Ok, I've got it, thanks!