agl / pond

Pond
BSD 3-Clause "New" or "Revised" License
911 stars 109 forks source link

DELIVERY_SIGNATURE_INVALID #135

Open leif opened 9 years ago

leif commented 9 years ago
Error from server pondserver://[redacted]@jb644zapje5dvgk3.onion: error from server: DELIVERY_SIGNATURE_INVALID

This was happening a long time ago in #8 but it was apparently fixed then.

As discussed in #134, it has been happening again to a lot of people lately. It has happened to at least four of my contacts now, three of whom are on the default server. It has also happened to some of @burdges contacts.

When it happens, it affect's all of the person's contacts at once: the server rejects all messages to them. In three of the cases, they can still send messages, but in another case messages from them are delivered but yield a "Failed to decrypt" message.

Looking at https://github.com/agl/pond/blob/84192035288a40deeed55e798c4a2d4236029570/server/server.go#L1184 I see that the group is being read from the account object before account.Lock() is called at https://github.com/agl/pond/blob/84192035288a40deeed55e798c4a2d4236029570/server/server.go#L1191 which seems wrong. But I don't see how that could cause this problem, unless two revocations are being processed at the same time (which I don't think should be able to happen).

leif commented 9 years ago

To be clear, in the server code it seems entirely possible that two revocations could be happening at once since each connection spawns another goroutine, but I don't think that the client could be sending two in parallel.

burdges commented 9 years ago

As Leif mentioned, you can usually still send messages, so here is a temporary work around, assuming your not using TPM.

Create a new state file with : mv .pond .pond-old pond & pond --state-file=.pond-old &

For every contact in your old pond, initiate a corresponding contact in your new pond with a generated shared secret, and use your old pond to send that shared secret to your old pond contact. Include an explanation that they should not revoke the old you until this issue gets fixed.

agl commented 9 years ago

"When it happens, it affect's all of the person's contacts at once: the server rejects all messages to them." -- it's unclear but I think you're saying that all inbound messages to a user can suddenly get rejected after a revocation. That's crappy because it suggests that the error is in the group signature update at the server, as opposed to the signing at the client.

Someone to whom this has happened, and who uses the default server, please:

1) Quit Pond. 2) Build editstate from the Pond source code. 3) Copy the statefile to a temporary location 4) From the editstate directory, run EDITOR=vim ./editstate --state-file=/tmp/state-file 5) Copy out the lines from near the top that start with "Public", "Group", "GroupPrivate", "Generation" into a different file. (Note that editstate hasn't been updated with the latest statefile changes so it won't be able to save the file, but that's fine -- you're not making any edits and it's a temp copy anyway.) 6) gpg encrypt that file with my key from https://www.imperialviolet.org/key.asc and email to agl at imperialviolet dot org.

For one, I might be able to manually reset the group public key on the server for your account. Secondly, I'd like to know what went wrong. I suspect that it'll be clear that something's wrong but with little information as to what so I'll also try running the bbssig code in a loop to test revocation updates.

agl commented 9 years ago

Oh man, I had not noticed how bad the error fixed in 1433bc3ec3707da256eb8689f7e827cc046d4ec5 was. I have bbssig running random tests in a loop on all the cores of my machine and I'll leave that overnight, but that fix could certainly have done it if it duplicated or dropped revocation messages.

burdges commented 9 years ago

Yes! I definitely got duplicate queued messaged when sending to an account that already had the DELIVERY_SIGNATURE_INVALID error, but obviously the same thing should happen when sending to an account in a GENERATION_REVOKED state. We'd trouble replicating the DELIVERY_SIGNATURE_INVALID error because it required two people to be revoking a generation simultaneously.

leif commented 9 years ago

it's unclear but I think you're saying that all inbound messages to a user can suddenly get rejected after a revocation

yes, that is what i was trying to say. sorry for the lack of clarity.

This issue seems to have bee occuring more lately, but it definitely happened before September 9 too so the bug fixed in https://github.com/agl/pond/commit/1433bc3ec3707da256eb8689f7e827cc046d4ec5 is not the only thing causing it.

agl commented 9 years ago

I've had bbssig running on random test cases and there were no failures yet.

One or more people who emailed the information that I requested above reported forking their state file., i.e. using an old version of the state file that happened to be on a different machine and doing a revocation from there. That could certainly cause this issue.

(The HMAC replacement for group signatures would fix that particular way of breaking when forking the state file, but of course there are others. Forking will remain deadly for the foreseeable future which suggests that perhaps the home server should keep a revision number for the client and refuse to deal with reverts.)

burdges commented 9 years ago

I'm nervous about completely wrecking users ability to use state file backups because many pond users have restored a state file from a backups losing only a couple contacts due to the ratchet. In my case, I found that entomb trashes the state file on Macs, and fails to entomb properly, a bug I haven't yet looked into fixing. And there is already a version number in the generation, which could report more information from the server.

We suspect we understand this problem after @agl pointed out that the client bug fixed in 1433bc3 could've caused it. We should however ask ourselves : Is there a server bug visible in the fact that this client bug could mess up the user's account on the server. On the one hand, I guess a malicious client with your state file could always trash your account. On the other hand, maybe there is something we can change to make it less vulnerable to accidents.

burdges commented 9 years ago

Would this bug be prevented by locking the account earlier than here? : https://github.com/agl/pond/blob/master/server/server.go#L1191

agl commented 9 years ago

The locking in the server is supposed to protect the server against malicious clients – not to protect clients from themselves.

Although locking earlier would stop two clients from submitting revocations at the exact same time, it wouldn't stop the problem in general: "split brain" clients would still break themselves.

burdges commented 9 years ago

Just encountered a similar issue in which my contacts all broke with DELIVERY_SIGNATURE_INVALID. I'll try to debug it, but looks nasty..

We'd many strange pond occurrences during the Tor developer meeting, which might've done weird shit with a generation update, but the revoke that actually broke pond was afaik yesterday.

burdges commented 9 years ago

I've found a pointer towards the bug :

A new pond test account paired with an old pond test account via manual pairing just attempted to send a message, but got :

Mar 17 11:09:02: Starting message transmission to pondserver://ICYUHSAYGIXTKYKXSAHIBWEAQCTEF26WUWEPOVC764WYELCJMUPA@jb644zapje5dvgk3.onion
Mar 17 11:10:20: Failed to read from pondserver://ICYUHSAYGIXTKYKXSAHIBWEAQCTEF26WUWEPOVC764WYELCJMUPA@jb644zapje5dvgk3.onion: read tcp 127.0.0.1:9150: i/o timeout
Mar 17 11:10:20: Next network transaction in 59.74s seconds
Mar 17 11:11:20: Starting fetch because of timer
Mar 17 11:11:20: Starting fetch from home server
Mar 17 11:12:01: Next network transaction in 14m14.614s seconds

Except the message did arrive in tact at the old pond test account. These state files are not sensitive and I can transfer them if I have some way to change the passwords on them.

burdges commented 9 years ago

We've occasionally had this error message too :

Failed to decrypt message: ratchet: duplicate message or message delayed longer than tolerance

There is seemingly a race condition in which duplicate messages can be sent, maybe by spamming Transact Now alone, probably not just quitting and restarting Pond, but maybe also by Tor dropping the circuit.

burdges commented 9 years ago

In the short term, I'd suggest that people try to be on a stable network connection, and avoid quitting pond, when they revoke contacts. I suspect this is more due to Tor dropping the circuit than anything the user does with transact now.