Do without a DB table - Githubissues

GoogleCodeExporter commented 9 years ago

Would it be feasible and pointful to remove the need for a database table and 
instead store the 
data in a crypted string stored in the HTML form?

Original issue reported on code.google.com by eallik on 12 Dec 2009 at 9:32

GoogleCodeExporter commented 9 years ago

This seems to make perfect sense, I'll give it a try when I find a moment.

Original comment by mbonetti on 12 Dec 2009 at 9:56

Changed state: Accepted
Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

On hindsight this is harder than expected, but I'm not giving up :D

Original comment by mbonetti on 23 Jan 2010 at 11:33

GoogleCodeExporter commented 9 years ago

Interesting, what's making it harder? Just curious.

Original comment by eallik on 23 Jan 2010 at 11:43

GoogleCodeExporter commented 9 years ago

Well, one challenge is to find a crypto function that is bundled with Python 
2.4+ (as we 
don't want to add a dependency on e.g. pycrypto) or come up with a python 
implementation of a two-way crypto function I could ship with the app.

I'm open to suggestions, though!

Original comment by mbonetti on 23 Jan 2010 at 11:51

GoogleCodeExporter commented 9 years ago

I'm personally incompetent when it comes to cryptography, but wouldn't a md5 
hash with salt suffice? Also, 
you could add a SIMPLE_CAPTCHA_CRYPTO setting that would override whatever is 
default. Doesn't captcha 
usually protect from generic bots anyway? And do generic bots generally have 
the ability to break encrypted 
form fields? They'd have to know what to look for anyway...

Original comment by eallik on 23 Jan 2010 at 3:34

GoogleCodeExporter commented 9 years ago

I'm afraid MD5 won't do, because we need a way of persisting the challenge & 
response between the moment the captcha is instantiated, displayed (a different 
view!) and finally validated.

MD5 is a one-way hashing function, i.e. it computes a hash of a given string, 
but 
there is no way of reversing the hash, so it'd be useless because we'd lose the 
challenge and response.

What we need is a full-blown crypto function that computes an encrypted version 
of 
challenge+response, that can be decrypted back by the image generating and 
captcha validating views.

Original comment by mbonetti on 23 Jan 2010 at 4:34

GoogleCodeExporter commented 9 years ago

http://dpaste.com/hold/149454/

Would that suffice?

Original comment by eallik on 23 Jan 2010 at 5:06

GoogleCodeExporter commented 9 years ago

Actually you should consider my last comment void and null :P

Original comment by eallik on 23 Jan 2010 at 5:23

GoogleCodeExporter commented 9 years ago

That isn't completely far off, the only problem is that the scheme is a bit too 
easy to 
break, and that the alphabet of the input doesn't seem to support unicode.

But I could start from there and maybe convert to and from base64 before 
encrypting / 
decrypting.

Original comment by mbonetti on 23 Jan 2010 at 7:01

GoogleCodeExporter commented 9 years ago

My (competent) friend recommended to take a look at XTEA/XXTEA, so I went and 
found this for Python: 
http://code.activestate.com/recipes/496737/
It's in the public domain so you can just grab and use it, and the code itself, 
comments stripped, is as short as 
mine.

There's also a vanilla RSA implementation that I found. It seems to have no 
license so I guess it's basically in the 
public domain?
http://code.activestate.com/recipes/572196/

Original comment by eallik on 24 Jan 2010 at 3:32

GoogleCodeExporter commented 9 years ago

XTEA is actually pretty perfect, nice find!

I have a hacked django-simple-captcha implementation running without any models 
and it seems pretty solid. I'm gonna run a couple performance tests and will 
have to 
adapt the unit tests before committing, but it seems pretty good so far.

Original comment by mbonetti on 24 Jan 2010 at 11:33

GoogleCodeExporter commented 9 years ago

I just realized something: if we drop the database persistence we lose the 
statefulness of the verification process, which exposes us to a repetition 
attack:

* Spammer resolves one captcha, remembers the encrypted key
* Spammer re-posts the same response + key at will
* Captcha verification process always validates because a given 
challenge/response 
doesn't get "consumed".

Thoughts?

Original comment by mbonetti on 24 Jan 2010 at 12:44

GoogleCodeExporter commented 9 years ago

OK, is it important that you cannot EVER use the same captcha verification 
twice? Otherwise you could add a 
crypted timestamp and only allow a certain time-window, say, 2 minutes.

And about scraping models -- You could keep them and just implement 2 backends 
which the developer can 
choose from :)

Original comment by eallik on 25 Jan 2010 at 4:31

GoogleCodeExporter commented 9 years ago

re: 2-minutes time window. This is already implemented and enforced in the 
model 
backend, but there we also ensure the unicity of the hash to prevent 
repeat-attacks. 
Problem is, imagine a blog where the only protection against spam is a captcha. 
An 
attacker could submit 100'000 spam comments in two minutes... 

re: two backends. I'd rather not go this way. Going DB-free is tempting: no 
need to 
syncdb when the developer adds/removes the app, but offering him to chose a 
less-
secure solution in exchange of this small improvement is not an option. And 
having 
to maintain two backends is a bit silly IMO.

I'd rather think of a way of implementing a nonce and add that to the encrypted 
part 
to make sure the data is only posted once. I'm not sure you can have NONCE's 
without a stateful backend, though.

Original comment by mbonetti on 25 Jan 2010 at 6:33

GoogleCodeExporter commented 9 years ago

I have a feeling you are right that a stateful backend is still needed. I will 
ask one of my more clever friends, 
though. Maybe he has an idea.

Original comment by eallik on 25 Jan 2010 at 7:17

GoogleCodeExporter commented 9 years ago

I think you can get pretty far by using a hash function keyed with the secret 
key from settings: (h is an arbitrary hash function, || means concatenation)

* Store h(secret_key || captcha_text) in a hidden field (the user has no way to 
get to the captcha_text because he can not invert the hash function.)
* On submission you can check if the submitted value is correct by calculating 
the hash again.

Using the secret key prevents the user from just submitting a matching hash of 
whatever he entered into the captcha field.

There is no way to prevent repetition attacks but the method above can be 
extended to enforce the timeout:

* Store h(secret_key || absolute_time_the_captcha_expires || captcha_text) in a 
hidden field and absolute_time_the_captcha_expires in plain in another hidden 
field.
* On submission, check if the submitted expiration time and captcha value match 
the hash and the time is still in the future.

As above: The user has no way to mess with the hashed data (e.g. change the 
expiration time) because he doesn't know the secret key.

Note: If you use a weak hash function, it's possible to obtain the secret key. 
I'd recommend some SHA* instead of MD5.

Original comment by da...@dfoerster.de on 29 Jul 2010 at 1:55

GoogleCodeExporter commented 9 years ago

@ #c16 

The problem is, what should we use as a timeout? During the validity of the 
captcha, say one minute, an attacker can easily submit thousands of valid 
forms. Anything under one minute wouldn't be practical (it also depends on the 
complexity and length of the form)

I think we do need a nonce to avoid repetition attacks, and the only way to 
achieve that seems to be trough persistence.

Original comment by mbonetti on 29 Jul 2010 at 9:13

GoogleCodeExporter commented 9 years ago

Use memory based store, eg. memcache. Or come with your own shared memory store.

Original comment by yuri.pim...@gmail.com on 25 Dec 2010 at 6:18

google-code-export / django-simple-captcha

Do without a DB table #18