Do not render contact form if it is requested directly by Spammers

adminBTI commented 3 years ago

Despite the anti-spam honeypot, I keep getting spam emails bothering me to renew my domain name or buy their "contact us" form spam services.

Legitimate users would access my website first and then go to the /contact page to submit the form. How can I display an error page instead of the "contact form" for those spammy clients who are hitting my /contact page directly?

I do not want to insert any filtering rules on the frontend webserver, because referrer can be spoofed and Codered sends csrftoken cookie only for form submission. Some commercial proxy servers even remove referrer header in http request.

Is it possible to use a different honeypot field for each client? Such as, different radio and input fields?

thenewguy commented 3 years ago

How would you detect these spammers? It would not be unexpected for someone to come to your "Contact Us" page directly from Google.

Perhaps you should look at adding a captcha message to your form if you want to make it more difficult for spammers? This one is very effective: https://www.google.com/recaptcha/about/

adminBTI commented 3 years ago

I exclude /contact page from search engine crawling! Also, in my /contact page, I only display the actual form, no other useful content.

Google's products are meant for its own employees, not for others. If you don't agree with me, then you are still young. I don't know about you, but I often cannot solve these Google Captchas! Is a tiny corner of traffic light box, which is not the actual light, still considered a traffic light? Is the box with just the tip of bicycle handle bar not to be counted? I don't know.

A lot of my website visitors are not from California. So, they would not agree to the jurisdiction of California (a requirement for using Google Captcha. Read their T&C and Privacy policies). A lot of websites are infected with dependent links to Google's fonts, scripts, captchas etc..

I use the following in my nginx.conf to block Contact form spam on a DjangoCMS site.. (It will work for Coderedcms once I figure out how to add language middleware):

# 1. Make sure that /en/contact/ is excluded in robots.txt
# 2. If LANGUAGE_COOKIE_NAME is not django_language (default), change accordingly
set $var "$uri$cookie_django_language";
if ($var = "/en/contact/") { return 404; }

thenewguy commented 3 years ago

It sounds like you've got it figured out. Best of luck 👍

onaralili commented 3 years ago

I exclude /contact page from search engine crawling! Also, in my /contact page, I only display the actual form, no other useful content.

Google's products are meant for its own employees, not for others. If you don't agree with me, then you are still young. I don't know about you, but I often cannot solve these Google Captchas! Is a tiny corner of traffic light box, which is not the actual light, still considered a traffic light? Is the box with just the tip of bicycle handle bar not to be counted? I don't know.

A lot of my website visitors are not from California. So, they would not agree to the jurisdiction of California (a requirement for using Google Captcha. Read their T&C and Privacy policies). A lot of websites are infected with dependent links to Google's fonts, scripts, captchas etc..

I use the following in my nginx.conf to block Contact form spam on a DjangoCMS site.. (It will work for Coderedcms once I figure out how to add language middleware):
# 1. Make sure that /en/contact/ is excluded in robots.txt
# 2. If LANGUAGE_COOKIE_NAME is not django_language (default), change accordingly
set $var "$uri$cookie_django_language";
if ($var = "/en/contact/") { return 404; }

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

murty2 commented 3 years ago

Yes, I know some bots will access my website directly. Almost all such bots are upto no good anyway because those are looking to spam.

OOPSpam is a commercial solution and you may be trying to promote a commercial solution on this open source page.

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

onaralili commented 3 years ago

Yes, I know some bots will access my website directly. Almost all such bots are upto no good anyway because those are looking to spam.

OOPSpam is a commercial solution and you may be trying to promote a commercial solution on this open source page.

Some bots simply visits a website and starts crawling instead of directly coming from a search engine. Also this won't prevent manual spam. As an alternative approach would be to integrate spam filtering API like OOPSpam which is GDPR compliant.

I was replying to @adminBTI comment.

It is true that OOPSpam is commercial and that is how it can offer to be privacy-friendly unlike privacy nightmare reCaptcha. Other anti-spam services like Akismet are commercial and they tend to be commercial to keep operation going.

If privacy non-issue for you and looking for free alternative reCaptcha or simple heuristic spam words check would work.

vsalvino commented 3 years ago

Great discussion; chiming in on the various suggestions in this thread:

I think the only possible "true" solution is to make the forms flexible enough to integrate with a commercial spam checker such as Google reCaptcha or some of the other products mentioned in this thread. We would probably be inclined to support Google out of the box, after entering an API key in the wagtail settings, and provide a hook for others to implement their own.

The honeypot method would remain the default as it is simple and free. I would like to improve it a bit, but without seeing spambot behavior it is difficult to know how they are getting through it. We could potentially implement a rate limiter to prevent a single IP from submitting the form X number of times per minute.

As for the suggestion about blocking direct hits to the URL with no referer, that is something we would never directly support, as it is a very common use case (e.g. sending a link to the form URL in an email). But if it works for your individual site, the nginx or django middleware methods referenced should be sufficient, without requiring any changes to coderedcms.

murty2 commented 3 years ago

Please consider implementing

Simple captcha https://github.com/mbi/django-simple-captcha
Two honeypot fields that change for each request. For example, one radio and another short text for one request and then two multi-select for another request

I am not sure rate limiting at the application that caches is a good idea. Webservers and firewalls are better for that.

A lot of Ubuntu kids seem to use Fail2ban but I am more inclined to use something like SSHguard to block or rate limit IPs. Even with ipset module, it does take 100MB or so of memory to block or rate limit IPs in firewall, but I can take this as a decent compromise when compared to full fledged WAF

I wrote a firewall level script that blacklists which stopped spam. https://github.com/murty2/blacklist But ideally, I would like to know how a form data can be checked by spamassasin or dspam filter process (similar to how email is checked before delivering)

The honeypot method would remain the default as it is simple and free. I would like to improve it a bit, but without seeing spambot behavior it is difficult to know how they are getting through it. We could potentially implement a rate limiter to prevent a single IP from submitting the form X number of times per minute.

vsalvino commented 3 years ago

Posting here for future reference: May have found a good open source captcha package we could integrate with: https://django-simple-captcha.readthedocs.io/en/latest/index.html

coderedcorp / coderedcms

Do not render contact form if it is requested directly by Spammers #420