Add option to mark paste as private

Currently TorPaste can benefit from an option to mark pastes as Private. Now the meaning of Private can vary, although the most popular definition seems to be that the paste is password protected, so even if someone has the link, they still can't see the paste, unless they know the password.

Here are some issues we have to discuss before moving on with an implementation:

What will the Paste ID be? sha256(paste)? sha256(paste + password_hash)?
How will the password hash be computed?
Will the pastes be encrypted?
How can we minimize the effect on past private pastes assuming the server is compromised?

For the second question I think we should store each paste password as a properly hashed password, just like a website would. This can of course be stored in the Paste Metadata.

For the third question, it's kind of tricky: we cannot use JavaScript, therefore if we decide to encrypt the pastes, it has to be done server side. Both encryption and decryption. This is the reason this feature has not been implemented and the About page recommends the use of PGP/GPG.

For the fourth question, we need to make sure that if the pastes are indeed encrypted, the cryptographic keys never appear in server logs, etc. that an attack could gain access. Therefore the paste password should be sent with a POST request. In addition to that, we need to be able to decrypt the pastes only with the correct key, so the decryption key must not be stored anywhere and instead must be derived from the Paste Password, most likely using a PBKDF. If we're using encryption, then we most likely need a HMAC to verify the integrity of the encrypted data.

Of course, since our solution is 100% server side, the administrator can modify the code to store the encryption keys, but we should assume that we trust the sysadmin but accept the fact that the server will be compromised at some point.

Some thoughts on these questions:

If the paste id is H(paste), then we allow private pastes' visibility to be overwritten if someone posts the same content as public. H(paste + H(password)), H(paste + password), and H(paste + H(password + salt)) all have a similar problem: if another user posts a paste whose content is "paste + H(password)" (respectively, "paste+password" or "paste+H(password+salt)"), and mark that paste as public, then the original private paste is overwritten. I think allowing unlisted paste to be made public is not a big deal, but we should probably try to keep private ones safe. Therefore, I propose to use a longer id: id = H(paste) + H(password+salt).

Remark: the logic to view a private paste should then probably be as follows:

request a paste id
ask a password without checking paste validity
check if the paste exists, and if yes, check password
if the paste does not exist, or if the password is invalid, reply with a generic error "invalid password or unknown paste."

This flow prevents private paste enumeration, which prevents people from finding collisions for the H(password+salt) (this would allow access to random private pastes), and also prevents people from checking if a certain text has been "TorPasted" in a private paste (by checking if that text's sha256 exists).

Agreed.
Ok, I can see a PBKDF being used on the password upon reception of a new paste, using the obtained key to encrypt the paste, and storing only a hash of the password. When a user tries to read the paste, they send a password, we check the hash, and if there is a match, we use the same PBKDF to obtain the same key and decrypt. This relies on the sysadmin being benevolent, so it's obviously not perfect, but at least the data is encrypted at rest.
Now if the server gets compromised, the whole encryption scheme is broken. We can make the python files read-only for non-root users, so if someone simply gets non-root access, TorPaste's output remains safe. But if the intruder gets root access, there is strictly nothing we can do...

Another scheme that we can use for Paste IDs is H(ciphertext)e. By adding that extra e in the end of the Paste ID we can ensure that a Public / Unlisted Paste will never collide with a Private Paste. In addition to that, for someone to overwrite the previous paste, they have to create a new Private Paste with the same plaintext and password, which is pointless from an attacker's point of view.

I am thinking we should not store a password at all in the backend, only the ciphertext as well as a HMAC to make sure the decryption key is correct and the data have not been tampered with. The encryption key should be derived however from a PBKDF so the flow is like below:

New Paste:

encryption_key = pbkdf(password, bits=256)
iv = random_iv()
ciphertext = encrypt(paste_content, encryption_key, iv)
hash = hmac(ciphertext, encryption_key)
paste_content = ciphertext
paste_id = paste_content
store(paste_id, (paste_content, iv, hash))

Request Paste:

if ( not paste_exists(paste_id) ):
    sleep(random(0ms, 50ms))
    return GenericError()
provided_key = pbkdf(password, bits=256)
if ( hmac(ciphertext, provided_key) != stored_hash ):
    sleep(random(0ms, 40ms))
    return GenericError()
paste_content = decrypt(ciphertext, provided_key, stored_iv)
sleep(random(0ms, 30ms))
return paste_content

NOTE: The above is just an example, and much more thought should be given into how we will code this. The sleep() functions have been added to prevent timing attacks to infer whether a paste exists or the password is incorrect. Of course, this current method can be bypassed. We will need to use a more advanced one. In addition to that, PBKDFs usually need a salt as well. This must be a random salt, again stored as paste metadata, however it was not included above for the sake of simplicity.

Finally, about the server being compromised, I realize that this problem cannot of course be solved by TorPaste. What I meant was that if at some point an attacker gets access to all the data in the backend (pastes folder in filesystem backend), then they should not be able to determine the original content of the private pastes. If we for example sent the password via GET, it would show up in the server access.logs, so we're using POST. If we stored the encryption key in the paste metadata, then again, the decryption would be trivial. The threat model here is an attacker that somehow got, let's say, read-only access to the entire filesystem and they want to decrypt private pastes. So we can only encrypt at rest. About in transit data, for non-Tor users we trust there will be HTTPS (famous last words) that's properly set up and has Perfect Forward Secrecy, and as for Tor, we trust that there will be use of HTTP and the data will be encrypted with the Hidden Service Key. For that last part, I am not 100% sure there's PFS (my bet is on no), but it's the best we can do. I also don't know whether or not the new Onion Services with the new keys will have PFS, but this is beyond the scope of Private Pastes in TorPaste.. :-)

Ok, this sounds good. Or at least, good enough for the scope of TorPaste :D

Let's leave this issue open for now until we get more feedback and we also decide on the algorithms we will be using. This is just an initial design.

daknob / TorPaste

Add option to mark paste as private #47

New Paste:

Request Paste: