AdguardTeam / VscodeAdblockSyntax

Comprehensive extension to manage Adblock Syntax in VSCode: syntax highlighter, linter and much more. Available on the VSCode Marketplace.
https://marketplace.visualstudio.com/items?itemName=adguard.adblock
MIT License
74 stars 8 forks source link

Allow using unescaped `$` symbol in the value of some special modifiers #131

Open DandelionSprout opened 6 months ago

DandelionSprout commented 6 months ago

I noticed this when working on https://github.com/DandelionSprout/adfilt/commit/7f50937e2924f16129baa20810a276c9f44a6962 some minutes before I submitted this issue report, where the $ sign in ||www.amazon.$removeparam=/^[a-z_]{1,20}=[a-zA-Z0-9._-]{80,}$/ is shown by the VS syntax as invalid, despite such a setup being relatively important when handling RegEx in general.

image

So I can only presume that a fix of some sort has to be made, so that the $ sign would be shown as not-invalid.

Alex-302 commented 6 months ago

Shouldn't $ be escaped inside a regexp in the rule?

DandelionSprout commented 6 months ago

No, because the $ marks the end of the RegEx line.

Alex-302 commented 6 months ago

I mean when you copy regexp to a rule. https://github.com/gorhill/uBlock/wiki/Static-filter-syntax - I see \$ in some examples with regexps.

Btw it seems you forgot a directive here after removing the line 48 https://github.com/DandelionSprout/adfilt/commit/ee3faac9930f536dc6436286bf0f54066ac0b663#diff-11d0fe9cc978acebe3ce9040b1f319079c575ad786eeae23659c15715b360116L88

DandelionSprout commented 6 months ago

\$ is when there's a $ inside the matching text.

$ is for a textline end.

ameshkov commented 6 months ago

The @Alex-302's point is that $ is generally considered a special character according to the filtering rule syntax and thus it needs to be escaped whenever you are using it in this context.

Alex-302 commented 6 months ago

@ameshkov So there is an incompatibility with uBO here?

DandelionSprout commented 6 months ago

Meshkov does raise a very important question there.

/^parameter=value$/ is not only fully supported in uBO, but my current personal impression is that it's strongly recommended in uBO.

The way you at Team AdGuard above describe it, would require /^parameter=value\$/, which'd be a little odd compared to the semi-standardised RegEx code standards, but also indeed does result in some sort of AG-uBO syntax differences.

ameshkov commented 6 months ago

We should probably add support for this in AGTree/tsurlfilter just for compatibility reasons then.

scripthunter7 commented 6 months ago

@ameshkov If we add support, we need to modify:

scripthunter7 commented 5 months ago

@ameshkov @Alex-302 There are two other points here to consider:

  1. The regular expression may contain unescaped commas. Normally, we would use such a comma to separate modifiers, this is the reason why AGLint gives the error that /^[a-z_]{1 is not a valid modifier for input $removeparam=/^[a-z_]{1,20}=[a-zA-Z0-9._-]{80,}$/, because AGTree splits it at the first unescaped comma within the regexp.
  2. There are some modifiers, like $domain, where the value can be completely regexp or just a list with some elements regexp. In other words both are $domain=/regex/ and $domain=example.org|/regex/|example.net valid cases.

Tokenization should work in these cases as well.