Closed tripleee closed 4 years ago
Can't assign to @bertiebaggio explicitly it seems, but he was volunteering to look into this. https://chat.stackexchange.com/transcript/message/50359451#50359451
Tangentially related perhaps: https://github.com/Charcoal-SE/SmokeDetector/pull/1630
Thanks for the ping :smile:
Discussion of a more general whitelist came up when considering the ASN whitelist, @makyen's thoughts seem relevant here:
I agree that a full implementation of whitelisting would be beneficial, but then we're talking about affecting lots of different detection reasons. There are also times when we want different whitelists for different detections, and to not share the list, or at least not share some entries between detections. A full implementation gets complex.
Do we have a few representative examples of things we'd like to exclude? I've been away from the Smokey coalface due to a job application recently so have missed some of the chat around this.
Mithrandir pointed out a few in chat last week, I think search for when I mentioned "bertieb" as a quick shortcut, or I can try to provide links tomorrow. Glorfindel mentioned one today, I think xda-develop.com or similar. A search in the FPs woud probably be more methodologically sound, similar to what I did for reviewing ASN:s today (I think #3007)
@bertiebaggio check your inbox for an org invite - that should make it possible to actually assign you here.
Related (possibly duplicate?): https://github.com/Charcoal-SE/SmokeDetector/issues/490
tripleee: Thanks, I'll have a look through chat history
Art: done, thanks!
Pling, any progress?
@machavity mentions pub.dev
: https://chat.stackexchange.com/transcript/message/51136860#51136860
This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.
As of ce83f319abed51e6d93ce4405ce0be25164603ef, I've added an is_website_whitelisted
helper method, and used it in a few checks in findspam.py
(often through the is_whitelisted_website
method that was already in there - though that only checked a small number of regexes).
The new helper method feeds from the metasmoke API: any domain that's tagged with whitelisted
will be excluded from Smokey's domain checks.
We can also add the helper to more findspam checks if we think it's necessary.
Is your feature request related to a problem? Please describe.
There is a number of domains which routinely triggers FPs because some of the watches are very broad. We want to be able to exclude well-known good sites from these broad watches in order to improve precision and reduce noise.
Describe the solution you'd like
bertieb implemented whitelisting for ASN checks in https://github.com/Charcoal-SE/SmokeDetector/pull/2664 and I was thinking already at the time that this should be refactored to govern all domain name checks.
Describe alternatives you've considered
Perhaps this should be coupled with a broader review of FPs so we can disable entire reasons (e.g. individual ASNs which produce too many FPs?) but let's keep this focused on the technical implementation.
Additional context
This has been raised in chat repeatedly over the last couple of weeks. I don't think it should be hard to do.