AdguardTeam / FiltersCompiler

A tool that compiles & validates filters
GNU Lesser General Public License v3.0
53 stars 12 forks source link

Escape quotes (' and ") in arguments of :contains in lists for uBO #156

Open Yuki2718 opened 1 year ago

Yuki2718 commented 1 year ago

Due to new cosmetic filter parser, uBO requires quotes in arguments to be escapsed: https://github.com/gorhill/uBlock/commit/a71b71e4c8a2037fc68970bc8912a76732edaade

An example that needs the fix is thetimestribune.com#?##tncms-region-option-rail-top-one > .tncms-block:has(> .block-title:contains(This Week's Circulars)) in Base filter.

Yuki2718 commented 1 year ago

Probably only :contains will be affected in AG lists as I don't see :xpath usage, but if :xpath is used escape won't work and needs to be wrapped by another quote.

ameshkov commented 1 year ago

CC @slavaleleka

Alex-302 commented 1 year ago

https://github.com/AdguardTeam/AdguardFilters/pull/134630#issuecomment-1313741190

Yuki2718 commented 1 year ago

@Alex-302 It's totally different. The PR was about an uBO issue that will eventually be fixed but the problem here is permanent change of uBO. BTW AG has been compatible with most of uBO syntax, so I wonder if it's possible for AG to be compatible with has-text with escape in some way.

scripthunter7 commented 1 year ago

@Yuki2718 FYI: We have created the ECSSTree library, which is fully backwards compatible with basic CSSTree, but thanks to re-tokenization it can parse the edge case you mentioned and many other cases as well. You can find it at the following link: https://github.com/AdguardTeam/ecsstree

Related: https://github.com/uBlockOrigin/uBlock-issues/discussions/2328

krystian3w commented 1 year ago

Also a little outdated is escape method, now works only in uBo:

example.org#?##A > .B:has(> .C:contains("This Week's Circulars"))
!
! shorter fake selectors to avoid horizontal scroll
!
example.com#?##D:contains('French "fo paux" is abused')

to avoid regex syntax for that as long uBo consider switch to ECSSTree.

Same protection needed orphaned [ ] { } ( ) if was copied to :contains() (when it is unfavorable to copy a long string so that the parenthesis is always closed).

scripthunter7 commented 1 year ago

Same protection needed orphaned ( )

See https://github.com/AdguardTeam/ecsstree/issues/11

krystian3w commented 1 year ago

I'm curious if ECSSTree is used to compile filters in the "/ublock/" directory - if that is, the AdGuard filter problem fixed outside of FiltersCompiler / coreLibs (it is possible that here almost never) / tsurlfilter / FiltersRegistry.

Quickly I can no longer find any filters what would have ' " in contains (simple regex can fail for contains with orphaned bracket).

(I did the same myself with one detection of incompatible filter - for me it was enough to check less characters)

The one from the demo I think someone has already rewritten/removed by hand (But can be cumbersome for new cases).

scripthunter7 commented 1 year ago

AdGuard filter problem

I don't think this is a "problem" of AdGuard filters only, but a more general problem, since this syntax is used in all adblockers, as far as I know. It's simply that CSSTree's tokenizer doesn't support this syntax (for obvious reasons). CSSTree is an excellent library for various purposes, I have been experimenting with it since June 2022, that's why ECSSTree was created to solve these annoying edge cases. I tested ECSSTree on the entire AG Filters Registry and the entire uAssets, so I handled several edge cases. We will use ECSSTree in a static analyzer tool, which should parse all types of adblock rules, but I think it can be used for any other purpose as long as it helps working with Extended CSS language elements.