DandelionSprout / adfilt

The place where I, DandelionSprout, store my web filter lists for countless topics, including my Nordic adblock list. As simple as that, really.
Other
1.3k stars 143 forks source link

Which blocklist is the best for blocking clickbait sites only? #272

Closed sanjayen closed 2 years ago

sanjayen commented 2 years ago

Hello, Thank you for creating and managing this repository. It is overwhelming and I do not know how you are able to update so many lists.

I read this article today https://www.morningbrew.com/marketing/stories/2021/09/08/brands-still-playing-ball-clickbait-ad-sites-advertisings-roach-will-survive-bomb and it mentions that the clickbait ecosystem has only grown. Morning Brew compiled a list of 129 clickbait sites but I cant find it.

Searching on Filter Lists and on the Internet, provides the following lists but most of them are not managed anymore.

https://assets.windscribe.com/custom_blocklists/clickbait.txt [reference: https://windscribe.com/features/robert under the sub-section Fake News + Clickbait in Block List Overview]

https://raw.githubusercontent.com/cpeterso/clickbait-blocklist/master/clickbait-blocklist.txt [repository: https://github.com/cpeterso/clickbait-blocklist/]

https://raw.githubusercontent.com/endolith/clickbait/master/clickbait.txt [repository:https://github.com/endolith/clickbait

However, none of them block the important clickbait sites such as taboola.com, outbrain.com, sokrati.com. EasyPrivacy does block these but do not block the newer ones such as itsthevibe.com, eliteherald.com or magellantimes.com

Any chance these are covered by an existing blocklist?

I will be more than happy to help you curate such a dedicated list for clickbaits.

iam-py-test commented 2 years ago

@sanjayen that sounds like a good idea Did you see https://github.com/Piega/Clickbait-blacklist? (also https://github.com/DariusIurca/RFNBL although it is region specific) It has not been updated this year, but maybe you could take some domains from it. All the other ones I found were already in your list.

DandelionSprout commented 2 years ago

From my time on working on FilterLists.com and its ~1,600 lists, I don't think I'm aware of any lists that currently block itsthevibe.com, eliteherald.com, or magellantimes.com. So there's definitely a market for such a list to be created and then be added to FilterLists.com.

Despite looking deeply into those three domains, I was only able to find an additional 5 that are owned by the owners of eliteherald.com and magellantimes.com:

pawszilla.com
zenherald.com
historicalpost.com
affluenttimes.com
atlanticmirror.com
iam-py-test commented 2 years ago

Sorry! I added a commit to my repo & it closed the issue

sanjayen commented 2 years ago

Not a problem. Let me find some (sub)domains and get a list going at my end and share.

Have not done lists myself and this will be a learning. Will share more as I read through some articles and find out the 129 as a starting point.

sanjayen commented 2 years ago

@sanjayen that sounds like a good idea Did you see https://github.com/Piega/Clickbait-blacklist? (also https://github.com/DariusIurca/RFNBL although it is region specific) It has not been updated this year, but maybe you could take some domains from it. All the other ones I found were already in your list.

Thank you for this list, will check.

If it was upto me, I would keep fake news and clickbaits separate for now. Because one is a rabbit-hole, the other is just pure waste of time.

iam-py-test commented 2 years ago

@sanjayen I just started a list here Its pretty much just the domains you reported and what Imre found

sanjayen commented 2 years ago

@sanjayen I just started a list here Its pretty much just the domains you reported and what Imre found

Yes i saw that.

Let me add some more to the list - though I will have to learn how to edit someone else's file.

iam-py-test commented 2 years ago

@sanjayen I just started a list here Its pretty much just the domains you reported and what Imre found

Yes i saw that.

Let me add some more to the list - though I will have to learn how to edit someone else's file.

Just hit fork at the top of the screen and then edit it in your fork See https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/about-forks, https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests The main repo is at https://github.com/iam-py-test/my_filters_001

sanjayen commented 2 years ago

Thank you for this. I am used to editing internal CVS systems.

sanjayen commented 2 years ago

@iam-py-test first commit, please check. This will need triaging as I may either have added the code wrongly or added a necessary site.

Have linked to the EasyPrivacy line to reuse the ad network blocks so that it does not break any website.

https://github.com/sanjayen/my_filters_001/commit/0c08609f3b269bee422725f666b09e6beaaab61b

I have also captured my references and how I tried to find out similar website using the method provided in the article as well as by reverse searching for similar sites.

Will update as and when possible because this is interesting.

iam-py-test commented 2 years ago

@iam-py-test first commit, please check. This will need triaging as I may either have added the code wrongly or added a necessary site.

Have linked to the EasyPrivacy line to reuse the ad network blocks so that it does not break any website.

sanjayen/my_filters_001@0c08609

I have also captured my references and how I tried to find out similar website using the method provided in the article as well as by reverse searching for similar sites.

Will update as and when possible because this is interesting.

Your commit looks ok. The only problem is I don’t know about is if Easylist’s license is compatible with my license

I don’t really have the ability to verify your entries now, but can later today

sanjayen commented 2 years ago

Then is there an easier way to not block everything from the ad networks and only stop the request when it is third party?

My thought is - the 3 ad networks (and more in the future) should be blocked when in 3rd party but allowed when accessed directly. The clickbait blocked altogether because they are either the same content or dummy domains.

I am new to this syntax and add uBlock Origin filters by right clicking on an element.

iam-py-test commented 2 years ago

Then is there an easier way to not block everything from the ad networks and only stop the request when it is third party?

My thought is - the 3 ad networks (and more in the future) should be blocked when in 3rd party but allowed when accessed directly. The clickbait blocked altogether because they are either the same content or dummy domains.

I think that your additions seem ok (I’m no expert on uBlock Origin)

I am new to this syntax and add uBlock Origin filters by right clicking on an element.

That’s what I do too; it seems to work

sanjayen commented 2 years ago

Can you edit your file, so that the list is compatible with the license you use and also the syntax that you want? I can then take it further and build upon it.

iam-py-test commented 2 years ago

@sanjayen I can't edit your file; I don't have access You will have to open a Pull Request in my repo & tick the allow edits by maintainers. I would prefer uBlock Origin syntax, but I am open to ideas I do have an idea of how to get around the license; we could remove the Easylist entries & then say that users should use regular adblock lists in addition to it, as those domains aren't really click-bait and I do not really want to go through the work of handling an adblocking list. We also could just !#include easylist, which fixes both issues

I am very sorry for not responding earlier

sanjayen commented 2 years ago

@iam-py-test done. Cleaned up the ad network list and provided a mention to EasyPrivacy.

Also, created a pull request for the first time. https://github.com/iam-py-test/my_filters_001/pull/85

I will go by what you are comfortable with as I am not that experienced with maintaining lists. Also, uBlock Origin syntax works for me as well as I can test and correct it.

sanjayen commented 2 years ago

Any thoughts on how we get people to use this filter and keep contributing?

iam-py-test commented 2 years ago

Any thoughts on how we get people to use this filter and keep contributing?

I can try to get it added to filterlists.com, but as my antimalware list was submitted a month or more ago and still has not been merged, it might not work. I will make a note in the README about it, but that won’t make much of a difference

iam-py-test commented 2 years ago

There seems to be a bit of an issue, as it refuses to load in uBo: image I think the problem was the includes of Easylist, so I removed that until I can investigate further

iam-py-test commented 2 years ago

@sanjayen any new domains? If you have any, you can report them at https://github.com/iam-py-test/my_filters_001/discussions/86, as it is easier to check there instead of this issue

sanjayen commented 2 years ago

sorry @iam-py-test I have been busy at work.

Will see if I can get some time over the weekend and update.

Will use https://github.com/iam-py-test/my_filters_001/discussions/86 to update