AnemoneLabs / unmessage

Privacy enhanced instant messenger
GNU General Public License v3.0
42 stars 7 forks source link

Support offline messages #32

Open adrelanos opened 7 years ago

adrelanos commented 7 years ago

Do you think there is a way to leave messages while the recipient is offline?

felipedau commented 7 years ago

TLDR: There certainly is, we just need to find a good one.

The easiest solution to implement is described in #8, where unMessage would buffer the messages you sent to an offline peer and the next time both of you are online, it delivers them. That means you do not deliver "offline messages", but at least makes it easier to resend them.

Unfortunately this still requires the conversation to be synchronous, i.e. both peers must be online at the same time. To solve this I think we inevitably need a server to store and deliver these messages to make it completely asynchronous and the peers can chat without needing to be both online at the same time. Having another actor in between probably poses a risk to the users, but maybe having them switching to another app which is not as anonymous/private/secure might be even worse.

As described in unMessage's protocol, its packets do not leak any identifying information. Everything it transmits is either encrypted, or hashed, or is an ephemeral public key, making it look like random information. Once we fix its length so that every packet has the same size, packets should not be distinguishable.This data is even wrapped by Tor's encryption for transmission, but having it available in a server (without Tor's encryption) should not be a problem. We believe that an attacker who intercepts an unMessage packet cannot learn anything that harms users' anonymity/privacy.

I think that requiring users to be online at the same time is not a good UX and we need a solution because it will probably be responsible for its popularity. I would not like to see workarounds like:

The solution to this is probably a good Private Information Retrieval system, but we have not looked into that yet. So far, buffering messages locally is the easiest solution we have. @rxcomm and I have discussed it before and we really do not want to have to involve servers. However, I certainly see how good of a UX improvement it is and this can be decisive for users considering adopting unMessage.

Let me know what you think! (And sorry for such a long post :O )

Thanks!

felipedau commented 7 years ago
  • Without any changes to unMessage, you can run your instance in a server which is always available and then you ssh to it when you are offline, but then you are trading ephemeral encryption for "regular" encryption

Update: I actually meant "when you are online".

adrelanos commented 7 years ago

Hasn't bitmessage solved that in some way? But that may not be possible without degrading your security properties?

Classic servers should imho be avoided at all cost. That would destroy the beauty of the otherwise serverlss nature of unmessage.

There might be an existing distributed storage you could use? ZeroNet, Freenet, Tahoe-LAFS, gnunet storage or otherwise? (Rhino.) DHT? There are lots of distributed storage systems nowadays. Must be usable over Tor. Some specifically describe themselves as anonymous distributed storage systems. Lots of things to research.

Encrypted messages could be stored in these anonymous distributed storage systems. Would that work or degrade some security properties? (forward secrecy, deniablity, whatnot)

//cc @HulaHoopWhonix

felipedau commented 7 years ago

Hasn't bitmessage solved that in some way? But that may not be possible without degrading your security properties?

I am not too familiar, but I believe that messages are sent to all users in the network, right? That means we would have to register users somewhere and we will also make all onion services public. So that might not be the best approach, I think.

Classic servers should imho be avoided at all cost. That would destroy the beauty of the otherwise serverlss nature of unmessage.

Indeed!

There might be an existing distributed storage you could use? ZeroNet, Freenet, Tahoe-LAFS, gnunet storage or otherwise? (Rhino.) DHT? There are lots of distributed storage systems nowadays. Must be usable over Tor. Some specifically describe themselves as anonymous distributed storage systems. Lots of things to research.

Thanks for mentioning them! I have heard of a few, but have not looked into them to see if we could use.

Encrypted messages could be stored in these anonymous distributed storage systems. Would that work or degrade some security properties? (forward secrecy, deniablity, whatnot)

I do not think it would degrade such properties. Here is an example:

Do you remember how nymphemeral received messages for the users? The nym server posts the (encrypted) messages to a public newsgroup, which anyone can access but not decrypt. Users are able to identify their messages by checking the subjects (hSubs), which are a hash of a secret only the server and the user know (they could also try and decrypt every message).

unMessage's packets have this same "subject" so that I am able to check with all my conversations' IDs if that packet belongs to any of them, as well as check if this is a request, using my identity key [0]. That way, it is even possible to have all offline messages posted to a newsgroup and when a user becomes online, retrieve the messages and check which ones belong to them. The problem with using this solution is that it does not scale nicely and having your messages publicly available in a very convenient way might still not be a good idea after all, even though they are encrypted and cannot be linked to you or grouped somehow.

[0]: The request's shared secret uses the handshake key generated by the sender so that no one that knows my public key can find the requests I received.

Another thing I am thinking is that we should probably use some kind of proof-of-work approach too.

I think it is a good idea to take a look into these systems because I am sure one of them might solve our problem. I will do that once I can.

Thanks @adrelanos!

felipedau commented 7 years ago

Encrypted messages could be stored in these anonymous distributed storage systems. Would that work or degrade some security properties? (forward secrecy, deniablity, whatnot)

I do not think it would. Here is an example:

Clarifying: I do not think it would [degrade such properties].

adrelanos commented 7 years ago

Did you consider to somehow combine unmessage with nymphemeral? (Specifically in case you consider news groups as the storage place for offline messages.) (Purely theoretical consideration since I am not familiar with either code base.)

These offline storage solutions have to be considered with care indeed. It would change from a less complex implementation with onions only to a more complex implementation where messages are left in the distributed storage.

Welcomed non-leeching usage of distributed storage systems may require to agree to store, upload encrypted blobs for others users, participating the network, which therefore involves computer security, moral and legal issues. So perhaps this should be explained and optional, which then would be a usability challenge.

Thank you for looking into this!

felipedau commented 7 years ago

Did you consider to somehow combine unmessage with nymphemeral? (Specifically in case you consider news groups as the storage place for offline messages.) (Purely theoretical consideration since I am not familiar with either code base.)

At first we did not have plans to do it (and we still do not), but @rxcomm and I have discussed it a while ago the possibility and we concluded it would require a complete change to both nym client and server.

These offline storage solutions have to be considered with care indeed. It would change from a less complex implementation with onions only to a more complex implementation where messages are left in the distributed storage.

Welcomed non-leeching usage of distributed storage systems may require to agree to store, upload encrypted blobs for others users, participating the network, which therefore involves computer security, moral and legal issues. So perhaps this should be explained and optional, which then would be a usability challenge.

Good idea, I agree! There could be even three levels available:

  1. [Default] Do not use offline messages
  2. Buffer messages to send once the peer is available #8
  3. Store offline messages in a distributed storage system #32 (this issue)

The users would choose to change the level, which increases proportionally to the usability it provides as well as the risks it represents.

Thank you for looking into this!

I would say the same. Thanks!

meejah commented 7 years ago

I'm pretty familiar with Tahoe-LAFS but so it basically takes your "big ciphertext problem" (e.g. a message, picture, video, etc) and turns it into a "small ciphertext problem" (e.g. the read-capability for the actual-thing).

So, you still have to be able to transmit the read-capability (readcap) to the intended target. You'd also have to deploy an "unmessage Tahoe-LAFS grid" since there isn't a big, public Tahoe-LAFS grid at the moment.

That said, it's possibly easier to store read-caps (< 200 bytes) in a DHT or similar than "the actual messages". For this, it might be good to look at e.g. http://magic-wormhole.io which is a SPAKE2-based way to get two endpoints to talk. Currently, it doesn't support "only one side is online at a time" but there are plans to do that. This is fairly similar to PANDA, the thing Pond used to connect two clients.

A little more "further away", possibly, but there's also mixnets, like @david415 and others are working on here: https://github.com/applied-mixnetworks/txmix which kind of solves your problem "for you" directly: you send "the actual message" as a mixnet-message.

felipedau commented 7 years ago

On Mon, Mar 27, 2017 at 11:28:54AM -0700, meejah wrote:

I'm pretty familiar with Tahoe-LAFS but so it basically takes your "big ciphertext problem" (e.g. a message, picture, video, etc) and turns it into a "small ciphertext problem" (e.g. the read-capability for the actual-thing).

So, you still have to be able to transmit the read-capability (readcap) to the intended target.

Hmm I should read more to understand how magic folders work. I thought that the easiest solution would be: instead of using a public shared directory (a newsgroup), use a private shared directory (a Tahoe-LAFS magic folder). Once the users have established a conversation and agreed on using this kind of offline messages system, they could asynchronously push/pop messages to that directory (when they are not online at the same time) and they would be the only ones aware of the existence of that directory as well as read it.

They would basically synchronize that directory whenever they went online and pop the messages they received from the other party. (Which means that if you have many contacts, you would synchronize each user's respective directory)

You'd also have to deploy an "unmessage Tahoe-LAFS grid" since there isn't a big, public Tahoe-LAFS grid at the moment.

Right. And do you think there will be one at some point?

That said, it's possibly easier to store read-caps (< 200 bytes) in a DHT or similar than "the actual messages". For this, it might be good to look at e.g. http://magic-wormhole.io which is a SPAKE2-based way to get two endpoints to talk. Currently, it doesn't support "only one side is online at a time" but there are plans to do that. This is fairly similar to PANDA, the thing Pond used to connect two clients.

Thanks for mentioning that! It has been a while since I last read its docs. I am going to do that again.

Do you think this would also solve the problem of delivering conversation requests to offline users? Because once there is an existing conversation, we can basically have any kind of agreement to retrieve our offline messages. But when the users have not made a handshake yet, then that is a problem. A user that would want to receive requests while offline has to tell people how to deliver messages to them. At the same time, it cannot be just a public directory because anyone would be able to read all of the requests that user received. (They would know how many requests were delivered, when they were delivered and the (ephemeral) handshake keys that were used by the senders - I do not think that is serious, but if we can prevent that...)

A little more "further away", possibly, but there's also mixnets, like @david415 and others are working on here: https://github.com/applied-mixnetworks/txmix which kind of solves your problem "for you" directly: you send "the actual message" as a mixnet-message.

Very interesting! But then would the mix wait until a user's onion service become online?

Thanks for looking into this @meejah!

meejah commented 7 years ago

The mix-network thing would replace your whole transport: instead of connecting TCP streams to onion services you'd just "send a message" or "receive a message" from the mixnet. How it does that, you don't care. But: nothing is deployed there yet, so that's a ways off.

As far as Tahoe-LAFS: I expect that yes, at some point there will be a public grid. But I also expect that's not imminent. You can only "share stuff" within the same Tahoe grid. So, until then you'd need "an unmessage Tahoe grid" and each participant would need to be told how to access it (could be burned into the code, at the simplest).

I think the best use-case for magic-wormhole in your thing right now would be the bootstrapping. Instead of telling your friend "here's my unmessage address" you would say "start unmessage, and accept an invite code of 3-wallaby-foobar" or similar. That one-time code bootstraps a connection (optionally via Tor) to your recipient -- at which point you can send them the "real" unmessage address, and any additional information (e.g. "I accept offline messages via Tahoe with config ....").

felipedau commented 7 years ago

On Mon, Mar 27, 2017 at 12:40:41PM -0700, meejah wrote:

The mix-network thing would replace your whole transport: instead of connecting TCP streams to onion services you'd just "send a message" or "receive a message" from the mixnet. How it does that, you don't care. But: nothing is deployed there yet, so that's a ways off.

Ohh got it. But using a mixnet would make it a "not so instant" messenger, right? I do think that allowing users to choose how long they can wait for their messages to be delivered and possibly adding nym capabilities (maybe integrating nymphemeral) to unMessage would be great, but that would be part of future plans (and definitely something I would like to work on).

How it does that, you don't care.

What is not clear to me is what the last mix in the chain would have to know about the recipient. That information should point to ~something~ that is always available, right?

As far as Tahoe-LAFS: I expect that yes, at some point there will be a public grid. But I also expect that's not imminent. You can only "share stuff" within the same Tahoe grid. So, until then you'd need "an unmessage Tahoe grid" and each participant would need to be told how to access it (could be burned into the code, at the simplest).

Hmm got it!

I think the best use-case for magic-wormhole in your thing right now would be the bootstrapping. Instead of telling your friend "here's my unmessage address" you would say "start unmessage, and accept an invite code of 3-wallaby-foobar" or similar. That one-time code bootstraps a connection (optionally via Tor) to your recipient -- at which point you can send them the "real" unmessage address, and any additional information (e.g. "I accept offline messages via Tahoe with config ....").

I see - That would be great if in the future we could also do that asynchronously (only one user is online). I am going to create another issue specifically about this handshake part with some aspects I think are worth discussing but not in the middle of this discussion.

So with that Tahoe config would it be possible to setup a magic folder which both peers would always have access to its files? That is, we would not have to worry about keeping track of readcaps because that is "part of the magic". Once we are able to read a magic folder, we have access to all of its files for as long as it exists. Is that right?

Thanks!

meejah commented 7 years ago

As for the folder-access: yes, there is a filesystem as part of Tahoe such that you can read all the children of a directory you have a read-cap for.

There is also a thing called "magic-folders" built on top of that, which synchronizes a local folder on your computer with other computers. This still has an "invite" phase and part of this message is a read-cap to a "shared" group folder and a writecap to "your" tahoe grid-side magic-folder. See more in the docs: https://github.com/tahoe-lafs/tahoe-lafs/blob/master/docs/frontends/magic-folder.rst

felipedau commented 7 years ago

On Mon, Mar 27, 2017 at 01:52:50PM -0700, meejah wrote:

As for the folder-access: yes, there is a filesystem as part of Tahoe such that you can read all the children of a directory you have a read-cap for.

There is also a thing called "magic-folders" built on top of that, which synchronizes a local folder on your computer with other computers. This still has an "invite" phase and part of this message is a read-cap to a "shared" group folder and a writecap to "your" tahoe grid-side magic-folder. See more in the docs: https://github.com/tahoe-lafs/tahoe-lafs/blob/master/docs/frontends/magic-folder.rst

Great! Thanks @meejah, I'll take a look into that.

rxcomm commented 7 years ago

I've been thinking about how to do asynchronous communication with unMessage. The basic idea is that servers are bad. However, we need one to store the asynchronous messages until they can be downloaded and read.

Thus, we need an (untrusted) server that can store messages where the server operator(s) can't find them. This is possible with tahoe-lafs, as mentioned above.

Consider the following asynchronous communication between unmessage client A and unmessage client B using the tahoe-lafs client at http://localhost:3456:

I'll use localhost as the tahoe-lafs client, but this could be easily changed to a tor onion service. I'll also use bash commands for this implementation, but the commands can be easily added to unmessage using the requests or urllib2 modules.

Initialization

As part of the conversation creation process in unmessage, if asynchronous communication is desired, a tahoe-lafs directory that is not attached to anything else in the filesystem is created and exchanged between the parties:

curl -X POST http://localhost:3456/uri?t=mkdir

The returned $DIRCAP is stored for future use.

Message Storage

1) A generates a message $MES1

2) A uploads the message to an unattached tahoe-lafs file and receives $FILECAP1 in response:

echo $MES1 | curl -T - http://localhost:3456/uri

3) A encrypts $FILECAP1 using an ephemeral key from the double-ratchet stack $FCENC1=e($FILECAP1, k_i) and stores it in a filename linked in the $DIRCAP directory:

echo "" | curl -T - http://localhost:3456/uri/$DIRCAP/$FCENC1

Message Retrieval

1) B downloads the json information for $DIRCAP:

curl http://localhost:3456/uri/$DIRCAP?t=json

2) B parses the json to obtain any new filecaps ($FCENC1, $FCENC2, ..) and decrypts those $FILECAP1=d($FCENC1, k_i), $FILECAP2=d($FCENC1, k_i+1), ...

3) B downloads the new messages:

curl http://localhost:3456/uri/$FILECAP1
curl http://localhost:3456/uri/$FILECAP2
...

and decrypts those messages.

4) B unlinks the encrypted filecaps from $DIRCAP:

curl -X DELETE http://localhost:3456/uri/$DIRCAP/$FCENC1
curl -X DELETE http://localhost:3456/uri/$DIRCAP/$FCENC2
...

thereby deleting all references to the messages.

It may be useful to make the messages mutable files, and zero out their content as well. I'm not familiar enough with the tahoe-lafs details to know if this overwrites the old data corresponding to the filecap or if it simply creates a new entry. If it truly overwrites all of the old data, that would add additional security.

Comments

Since the initial $DIRCAP is not linked into any current directory in the tahoe-lafs filesystem, only the parties to the conversation are able to locate the message directory. In combination with tor-only access, this functions as a sort of private information retrieval scheme.

The reason to store the encrypted filecaps in $DIRCAP rather than the complete message files is that unlinking is quicker. This may not be an advantage, in which case the messages can be linked directly to $DIRCAP.

The big disadvantage of this method is of course that somebody has to install and maintain an unMessage tahoe-lafs instance. Also, if the instance is public, it can be spammed. I believe the tahoe-lafs devs are working on some kind of accounting to prevent this, but for now...

rxcomm commented 7 years ago

A couple more thoughts on my discussion above...

1) @felipedau's suggestion above to require proof of work to store messages in the tahoe-lafs instance in order to avoid spam is a good one. That might work. It would require either modifying the tahoe-lafs code or writing a front end to test the POW and then pass a link to the tahoe-lafs client api. Which brings up my second thought.

2) If you can't trust the server, you can't trust the server. What happens if the tahoe-lafs client fakes the api, and reads all the messages. They are encrypted, and with tor-only access, no one knows where they came from, but the message ciphertext can still be read. I think most tahoe-lafs installations are for a use case where the users trust the server insofar as they know its running the correct code. They just don't want the disk provider to be able to read the stored data. AFAICT, this is a generic problem with remote PIR systems. Has this issue been discussed anywhere?

meejah commented 7 years ago

@rxcomm Servers in Tahoe-LAFS are not trusted for integrity or privacy. You do, however, have to trust the Tahoe client machine.

I haven't grokked the rest of the "messages via LAFS" discussion above yet though.

rxcomm commented 7 years ago

@meejah - I haven't been careful enough with my language ;). Thanks for your note - it helps clarify my thinking.

In what i described above, I was thinking of someone running a tahoe-lafs client that would accept messages from multiple unmessage clients and store them to a backend server stack. Based on my latest comment item 2), this clearly won't work.

Each unmessage client will need to run its own tahoe-lafs client as well in order to assure message security. My earlier discussion on how to store the messages will still work, but the tahoe-lafs client needs to be co-located with the unmessage client. Additional complexity, but it would be secure.

meejah commented 7 years ago

@rxcomm yes, that's correct: there'd have to be a Tahoe client on any unmessage client-machine to assure message integrity.

As far as overwrites: there are "mutable" and "immutable" (the default) files in Tahoe. You can't overwrite immutable files. Directories are implemented as mutable files, with a list of links to children -- so anyone who can add directory entries could also remove them. The mutable file itself wouldn't be deleted, but this might not matter if you can't reach it ;)

We have been discussing "add only" capabilities in Tahoe as an additional capability type.

meejah commented 7 years ago

Perhaps a model similar to Tahoe's "magic folders" would work:

Thus, any message Alice adds to the "to Bob" folder can be read by Bob (and Alice) but neither Bob nor anyone else may add messages to this folder. If Bob also wanted to give offline messages to Alice, he would do the inverse of the above. Alice can expire messages out of her folder after a while if she likes.

Note that if Bob remembered the read-cap to an individual message, he'd retain read access (even if Alice removed it from her folder). Tahoe follows a "preservationist" principle in this regard.

rxcomm commented 7 years ago

A question about mutable files in tahoe-lafs. If I change a mutable file, is the data corresponding to the old version of the file deleted on the servers? Or is the new data simply added?

meejah commented 7 years ago

@rxcomm The old data is unreachable. I don't remember if the actual data is definitely deleted immediately or if it takes a little while, but that's just implementation detail -- you still can't get it.

But, a caveat: if the old data was a list of read-capabilities (e.g. if it was a mutable directory) then you can retain those read-caps elsewhere and still have access to them.