Obfuscation in email fields on CKAN pages to reduce spam

augusto-herrmann commented 7 years ago

We all know spam bots harvest emails from web pages. The contents of the author_email and maintainer_email fields are displayed as mailto: links on the dataset pages. Of course, these are also available through the CKAN API, but maybe not all bots do read the JSON, or may just not crawl to it, especially considering that links to the API calls are not often found on the web.

Should CKAN use some kind of obfuscation technique to thwart spam bots harvesting e-mails from the dataset pages?

Useful links:

Nine ways to obfuscate e-mail addresses compared (an experiment conducted in 2008)
Does Email Address Obfuscation Actually Prevent Spam?
Best Method for Email Obfuscation?

metaodi commented 7 years ago

I would discourage to have such a mechanism. First of all, all obfuscation techniques are bad for the usability and/or the accessibility of the page. And I personally weight those more than the possibility to reduce to amount of spammers getting my email address. Spam must be filtered on the way from the sender to the recipient. If someone is able to send a sophisticated email that avoids all filter mechanisms, then they can build a bot to harvest obfuscated email addresses.

TkTech commented 7 years ago

@augusto-herrmann You're extremely encouraged to implement such functionality in an extension, and if something is missing in the API or interfaces to make this happen we'll fix it. However, in my experience, this is never worth the reduction in usability and doesn't ultimately stop anything.

As an aside, the blog posts you provided are way behind the curve - the first one starts in 2007. They aren't really valid tests either, since they used unique addresses only ever available on the test pages. In reality, most emails you're actively using are already on thousands of different free address dumps.

augusto-herrmann commented 7 years ago

It is a common misconception to think that screen readers do not execute Javascript. They have been capable to use javascript for quite a while, now. So, to send the e-mail addresses obfuscated in HTML, to then be de-obfuscated automatically in javascript is definitely not an accessibility or usability problem.

And if the references I wrote before may be old, here's one from 2017. It seems that Cloudflare does automatic e-mail obfuscation by javascript for anything they host - they do not even need to do anything to enable it, as it is completely transparent. If you install and use a browser extension that is designed to block javascript (such as NoScript) and access any site hosted by Cloudflare, you'll see that the e-mail addresses on any page is hidden until you allow javascript to run on that site. Most people never notice this, as browsers (and screen readers) do have javascript enabled by default, and the e-mail addresses are automatically decoded. Cloudflare is one of the largest CDNs and hosts content around the world. If e-mail de-obfuscation was indeed an accessibility and usability problem, bad practice and out-of-fashion as you've suggested, they clearly would not be using it today.

Of course people should use spam filters on mail servers, on relays and on client software. But obviously, it is not 100% effective as people keep getting spam and phishing e-mails quite often on their mailboxes, while also having some legitimate messages blocked. So, taking caution to not make the e-mail addresses readily available for harvester bots is a legitimate measure in avoiding spam that people may wish to enable.

Finally, while I could in theory implement this as a CKAN extension, it is an extremely small feature to justify all the overhead required by installing and configuring an extension just to enable this. That could be considered an usability problem for people who wanted to enable the feature. Considering this is a repository for discussing future ideas for CKAN features, I think I have suggested it at the right place.

ckan / ideas

Obfuscation in email fields on CKAN pages to reduce spam #198