enwikipedia-acc / waca

English Wikipedia Account Creation Interface
https://accounts.wmflabs.org/internal.php
The Unlicense
33 stars 30 forks source link

Create blacklist functionality for disposable email providers #473

Open stwalkerster opened 7 years ago

stwalkerster commented 7 years ago

Separate to banning so we can be more flexible in our approach, but with the same effect. Must tell the user that their email address provider is the issue, and to use a non-disposable email address.

Known providers

Actually OK (mail forwarding service only)

stwalkerster commented 6 years ago

https://dmoztools.net/Computers/Internet/E-mail/Spam/Preventing/Temporary_Addresses/

Also, check MX records

stwalkerster commented 4 years ago

I've used a composer library (wesbos/burner-email-providers) which keeps track of these in the reporting scripts. We should probably use that for this.

methecooldude commented 1 year ago

Can we make calls to https://www.disposable-email-detector.com/?shell#Email-Hippo-Wesbos-DEA-Identification-API-DisposableEmails which uses that librarys data?

stwalkerster commented 1 year ago

I'd rather we didn't make API calls where it's not necessary to do so.

wesbos/burner-email-providers provides a composer package we can use and reference the raw data file directly without needing to call out to third-party sites which may or may not be doing logging and data capture themselves.

stwalkerster commented 1 year ago

So thinking about this again several years later - I'm wondering if we should be treating this as bans, or just treating this as another datapoint in the interface to be checked while handling a request.

We've already got the yellow diamond flag for indicating uncommon email domains, perhaps we could accept the request and show this highlighted as a red waste bin flag?

I don't know what's best to do here.

In either case, we'll want this to be pretty responsive from a web UI point of view. I'm not convinced that scanning text files every single page load is the right answer - a database lookup might be quicker. We should probably do some analysis to figure that out.

methecooldude commented 1 year ago

Personally I think it should be struck whist the iron is hot and rejected at submission stage.

a database lookup might be quicker

I agree, maybe have a job that runs every, I dunno, 30 days(?) to update the DB with the latest domains and then just check against that as part of the validation of email address?

stwalkerster commented 1 year ago

Personally I think it should be struck whist the iron is hot and rejected at submission stage.

Yeah, though I'm having a hard time reconciling that with ACC's supposed role of being a last resort with human review for account requests. Is there a chance of false positives from these lists that other people maintain?

a database lookup might be quicker

I agree, maybe have a job that runs every, I dunno, 30 days(?) to update the DB with the latest domains and then just check against that as part of the validation of email address?

Exactly what I was thinking, though we might have to be careful with indexes to make sure it's faster than a text file lookup.

methecooldude commented 1 year ago

Yeah, though I'm having a hard time reconciling that with ACC's supposed role of being a last resort with human review for account requests. Is there a chance of false positives from these lists that other people maintain?

'If you are sure this is not a disposable address, email us at accounts-enwiki-l@lists.wikimedia.org' - If we get one, then we look a that domain more closely and potentially whitelist it in the update job