Wanted to start a discussion about an idea I had regarding @WordsOfMe's comments on #350.
Quality is critical for a project like this. Part of that is always coming up with ideas and methods to work to increase quality of the lists. So, I have no idea if this is even a remotely good idea. But brainstorming new ideas to increase quality I think benefits the project.
So. Do we think doing a full audit of every domain on every list would be beneficial to the project? Basically verifying that every one is legitimate and should exist on the list.
I'm not quite sure how this would work. Maybe it can be assisted by automation? Possibly a system can create audit issues based on if domains have recently be transferred owners or changed IP addresses that the domain points to? Then as those issues get marked as closed the system continues to expand and create more issues to end up encapsulating every domain? Maybe there can even be tools to give additional information in those issues that would be beneficial to determine if the domain should exist on the blocklist or not (screenshots of domain content maybe?)?
This would be a massive undertaking. And I'm not sure we are prepared for a project like this currently. I think it would also require some more concrete guidelines about what domains should be included vs not.
Just thought it'd be worth discussing to see if this is something we should work towards.
Unfortunately I cannot contribute with code, just some thoughts:
Manpower / brains is most valueable, so if a domain had to be removed once manually it should be avoided to generate work once more (hence the whitelist proposal)
Domains that were taken over some time ago / historically would be a lot of work to review, and until no one complains it would probably save time to just keep them for now
What I would see critical is the mechanism / workflow new domains are added to the list, which I must confess do not know and as I understood is meant to kept confidential. But the crucial point here will be to review how a false positive made it to the list and what can be learned from this. This question has been asked several times: "Is there a way to determine why it was added in the first place?"
Some thoughts on automatic checking before adding:
Check if the domain is actually registered
Think about if domains that only resolve a MX should be added, I guess a domain should at least resolve to an A / AAAA / CNAME.
One possibility to check the benevolence of a domain would be querying a number of DNSBL used for blocking spam. When I lookup the last false positives I had at https://multirbl.valli.org, a1.net is on 2 whitelists, and gmail-smtp-in.l.google.com is on 3 whitelists
Git is probably not ideal for managing add / remove requests. Most other blocklists (e.g. spamhaus) feature a web interface that will make it easier for the maintainers to modify the lists in an automatic way.
Wanted to start a discussion about an idea I had regarding @WordsOfMe's comments on #350.
Quality is critical for a project like this. Part of that is always coming up with ideas and methods to work to increase quality of the lists. So, I have no idea if this is even a remotely good idea. But brainstorming new ideas to increase quality I think benefits the project.
So. Do we think doing a full audit of every domain on every list would be beneficial to the project? Basically verifying that every one is legitimate and should exist on the list.
I'm not quite sure how this would work. Maybe it can be assisted by automation? Possibly a system can create audit issues based on if domains have recently be transferred owners or changed IP addresses that the domain points to? Then as those issues get marked as closed the system continues to expand and create more issues to end up encapsulating every domain? Maybe there can even be tools to give additional information in those issues that would be beneficial to determine if the domain should exist on the blocklist or not (screenshots of domain content maybe?)?
This would be a massive undertaking. And I'm not sure we are prepared for a project like this currently. I think it would also require some more concrete guidelines about what domains should be included vs not.
Just thought it'd be worth discussing to see if this is something we should work towards.