dhowe / AdNauseam

AdNauseam: Fight back against advertising surveillance
GNU General Public License v3.0
4.52k stars 189 forks source link

Too many internal links being collected #2252

Closed mneunomne closed 1 year ago

mneunomne commented 1 year ago

The latest build of Adnauseam is collecting a lot of internal content, seems like because of the inclusion of the procedural filters to the collecting mechanism. Need to do a check of internal links before collecting it.

We can also add an option (perhaps in advanced options) to enable/disable collection of content with internal links.

Screenshot 2023-01-16 at 15 55 26
mneunomne commented 1 year ago

This is happening because the following domains are currently on our internalLinkDomains list (more discussion about it here https://github.com/dhowe/AdNauseam/issues/1536).

  // targets on these domains are never internal (may need to be regex)
  const internalLinkDomains = ['google.com', 'asiaxpat.com', 'nytimes.com',
    'columbiagreenemedia.com', '163.com', 'sohu.com', 'zol.com.cn', 'baidu.com',
    'yahoo.com', 'facebook.com', 'youtube.com', 'flashback.org'
  ];
mneunomne commented 1 year ago

I believe this is happening because of the implementation of adnauseam parsing in the procedural cosmetic filters, now catching some content that are not ads.

mneunomne commented 1 year ago

youtube.com was added to the internalLinkDomains on this commit:

https://github.com/dhowe/AdNauseam/commit/75ec6e11287c609bca3295d62a77a56a84fe2744

mneunomne commented 1 year ago

And google.com on this commit:

https://github.com/dhowe/AdNauseam/commit/b72e5f03a70f579ddc4a006546538824e9bb9f02

mneunomne commented 1 year ago

the current issue with youtube thumbnails being considered as ads

The following cosmetic procedural filter:

youtube.com#@#ytd-rich-item-renderer:-abp-has(ytd-display-ad-renderer)

is generating a selector which is simply ytd-rich-item-renderer, which is far too generic. Which means that Adnauseam is not handling the procedural filter in the proper manner while parsing the ads.

More updates following soon, investigating.

mneunomne commented 1 year ago

The merge above correctly parses the procedural filters in the contentscript-extra.js instead, when the rules are originally being executed, therefore avoid to run the selectors again, and also parsing it the proper manner with the actions associated with each of the procedural filters.

Tested on youtube and google and seem that it is working.

mneunomne commented 1 year ago

fixed, closing