TokTok / c-toxcore

The future of online communications.
https://tox.chat
GNU General Public License v3.0
2.25k stars 284 forks source link

Group titles in persistent groups #1328

Open sudden6 opened 5 years ago

sudden6 commented 5 years ago

Current state

Problem

The latest set group title doesn't stay persistently, old clients can override it

Solution?

  1. Save the unix timestamp of the last set name and ignore all changes with a newer time
    • Do we have a common time base in groups so that clock differences don't matter?
  2. When a new user connects, do a majority voting on the group title (hash)
    • How to handle the case where only two users with different titles are online?
  3. Combine both of the above
    • Use timestamp * users with that title and let the highest win?
zugz commented 5 years ago

I'm not sure there's good solution to this. There is no one true reality with these groups. It is entirely possible for the group to split into multiple equally valid components, then later merge. Suppose we have the following friend relations: (A,B), (D,E), (B,C), (C,D). All 5 join a group together, then all quit. Later, A,B,D,E all come online. Then A and B will see each other in the group, as, separately, will D and E. These two subgroups might each change the title. Then C turns up, and the group merges into one. What title should be seen by whom?

Voting won't help here, as it's a tie. If we stored timestamps, we could say the last to change wins, or the first to change wins. But either is arbitrary, and someone is going to get annoyed when their title gets changed on them.

The way it currently works: when you join/rejoin the group, you take the title from whoever is bringing you into the group; after that, you only consider the title to have changed if you actually see someone changing it.

The seemingly random behaviour in your example illustrates a (common) edge case: when C is left alone in the group and then A arrives, both consider themselves to be rejoining the group, and so it is indeed basically random which title gets chosen (depends which packet wins the race). Actually, I wonder if they end up just swapping titles... that would sound like a bug.

In general, with the current system, group members might well disagree on the title.

Maybe storing timestamps and using the most recent would be a sensible solution. It would mean adding to the protocol though, and some ugliness to keep backwards compatibility (sending the title both with and without a timestamp).

If we just wanted to enforce a consensus and didn't care about it making sense, we could just strcmp the titles and have the earliest always win. No need for a protocol change that way!

This isn't the only problem like this with these groups, by the way. It's also possible for a name change not to get properly propagated. See the discussion in docs/minpgc.md.

zoff99 commented 5 years ago

so current group chat is mostly useless in general, without a groupbot. so there is really almost no improvement to the "old" group chat.

sudden6 commented 5 years ago

@zugz I agree that there's most probably no ideal solution, but we can always take the least problematic solution.

After reading your arguments that is timestamps, for the following reasons:

The drawbacks:

zoff99 commented 5 years ago

@sudden6 we don't rely on wall clock time anywhere in toxcore AFAIK at the moment. we only have monotonic time, which is by chance unixtimestamp on linux, but on windows it can be any random counter number. is it wise to add wall clock into toxcore now just for this?

zugz commented 5 years ago

Yes, I think I agree in the end that timestamping titles and using the most receent in case of conflict is the best solution.

Here's the simplest implementation I see:

First, add a version byte to PEER_QUERY_ID direct packet. Set it to 1. If the byte doesn't appear, the packet is considered as version 0. Note the current code doesn't check the length of the packet, and will just ignore this byte.

When responding to version >=1 such packets, in place of the PEER_TITLE_ID direct packet, send a new timestamped title direct packet containing the current title and the timestamp we have stored for the title.

On receiving such a packet, update the title and its timestamp to those received if the timestamp is greater than the currently stored timestamp. In case of ties, use strcmp on the titles.

If we receive an old-style PEER_TITLE_ID packet... I guess continue to use the current "title_fresh" criterion for accepting it, but set the timestamp to 0 if we do.

Save and load timestamps along with titles. The initial timestamp on the (null) title of a newly created group is 0.

When we receive a GROUP_MESSAGE_TITLE_ID message, set the title and set the timestamp to the current unix time according to our clock (even if this is earlier than the current timestamp).

Using the receiver's clock rather than the sender's may seem odd, but as well as meaning the protocol doesn't need to change as much, I think it handles better the case that some participant's clock is erroneously late; this way the winning title will basically be the last one that participant saw, whereas if we used sender timestamps it would tend to be the last one that participant set.

One nasty possibility: if a peer with a late clock disconnects and the title is changed while they're gone, they might persist with their last seen title when they rejoin, because they reckon it to be the most recent. I don't see a cheap way to handle this. I hope it's acceptable?

Things won't work particularly nicely when only some of the peers speak this new protocol. But it shouldn't be too bad.

BTW, accessing the system clock is something that's also needed for

1260.

sudden6 commented 5 years ago

Most of the part you wrote makes sense to me. I have a few questions about the current implementation though and depending on their answer I think we can improve the protocol a bit

When responding to version >=1 such packets, in place of the PEER_TITLE_ID direct packet, send a new timestamped title direct packet containing the current title and the timestamp we have stored for the title.

I think it would make sense to respond with the title timestamp plus the current time on the side which sends the new PEER_TITLE_ID packet. This way the receiver can sync their clock to "group time" on connection.

I think this also means we don't need the absolute wall clock at all.

zoff99 commented 5 years ago

where will you get all these new timestamps from?

nurupo commented 5 years ago

What about just using a number that is incremented every time a topic changes, instead of using a timestamp?

sudden6 commented 5 years ago

@nurupo a number has the disadvantage that if two different sets of group members are active at different times and each change the title once, they will arrive at the same number and we have a problem again.

By using some sort of absolute time we can ensure the latest title is set, as it's highly unlikely for a group to split and people setting the title at the exact same time.

zugz commented 5 years ago
  • When does the current implementation send the GROUP_MESSAGE_TITLE_ID message? Only when changing the title, or also when connecting to the group?

Only when changing the title.

  • if it's only when changing the title, we could ignore the PEER_TITLE_ID packet from old clients, thus in a mixed group the title would stay sane when others connect

But then if we connect via an old client, we don't update to the current title.

What potential for insanity do you see?

When responding to version >=1 such packets, [...] I think it would make sense to respond with the title timestamp plus the current time on the side which sends the new PEER_TITLE_ID packet. This way the receiver can sync their clock to "group time" on connection.

How would you keep this "group time" synchronised through splits and rejoins? Sounds like a nightmare!

sudden6 commented 5 years ago

But then if we connect via an old client, we don't update to the current title.

Would it be possible to ask different group members for the title, until one has the new protocol or we tried all? Without too much changing the protocol of course.

What potential for insanity do you see?

I want to keep old clients from being able to pass around bogus group titles. E.g. how would the protocol react if Alice (old) introduces Bob (new) to the group. Then Bob would accept the title from Alice, but would set the time to 0. Now Carol (new) connects via Bob, but has a title with a more recent time stamp. Should Carol:

  1. ignore the title from Bob
  2. send a title change notification with her title to the group

1) would result in Carol having a newer title, but don't displaying it. 2) would result in that Bob now thinks Carols title is the most current, even though maybe Dave has an even newer title, but is offline atm.

Writing this I'm not even sure if it's possible to solve this problem.

How would you keep this "group time" synchronized through splits and rejoins? Sounds like a nightmare!

Thought some more about it and I propose to send the difference to wall clock time in the PEER_TITLE_ID v2 packet. E.g. the party sending the packet computes: title_diff = current_time() - title_timestamp and sends the result. The receiver computes title_timestamp = current_time() - title_diff and stores the title with that as latest timestamp.

This way the peer connecting to the group can calculate backwards when this title was set and store it as his local time. No need to sync anything, because on the next title change we store the timestamp also in local time.

A possible drawback might be, that there is an accumulating error due to network delay and different clocks, but this should not matter as it's synced once a group title change occurs.

zugz commented 5 years ago

Would it be possible to ask different group members for the title, until one has the new protocol or we tried all?

The only way to do this which really fits the protocol is to send a group message to the whole group, and have every member reply. That means a lot of traffic, so I think we should try to avoid it.

Thought some more about it and I propose to send the difference to wall clock time in the PEER_TITLE_ID v2 packet.

That's a nice idea. But without an absolute time, we don't know when the current title+time is equal to one we are sent, which causes problems in the protocol.

So I think we should stick with unix time timestamps. We can deal reasonably well with bad clocks as follows: when setting a new title, we set the stamp to be the current time according to our clock or, if it's greater, the old stamp plus 1. So if someone with a broken clock sets a stamp that's in the future, or if we have many title changes in quick succession, we degenerate to incrementing version numbers for a while.

OK, so here's a reworked version of the detailed protocol proposed earlier (which was horribly incomplete).

I'll first ignore the problem of backwards compatibility, and assume all peers implement the new protocol, then later discuss changes to handle old clients.

First, a terminological matter. For futureproofing purposes, I suggest we replace everywhere "title" with "shared group state" (or "state" for short). For now this state will in fact just consist of a title, but we could imagine eventually extending it to copy some of the irc-like functionality of NGC to these groups. I don't know if we'll ever want to, but it doesn't hurt to prepare the ground a little. (Note that moderation would actually work with these groups, since we can ban/devoice by Tox ID, and joining is by invitation only.)

A "timestamped state" is a state together with a timestamp. To determine which of two timestamped states is "newer": first compare the timestamps, then use strcmp on the titles to break ties.

As previously, we adapt the Peer Query packet to include a version, and peers implementing the new protocol will reply with a new packet I'll now call the "State Response" packet, containing a timestamped state.

Also add a new kind of group message, the "State message" containing a timestamped state.

On receiving a State Response packet with a timestamped state different from our current timestamped state: If the received timestamped state is newer, update our copy of the state and send a State message with the new timestamped state. Otherwise (note in this case the received timestamped state is older), send back a State Response packet with our current timestamped state.

On receiving a State message, update our timestamped state if the received one is newer.

Analysis

Assume all peers implement this protocol.

The "connected component" of a peer is the closure under transitivity of direct connections, i.e. the peers to whom a message sent by the peer would be received.

Claim: A peer will always have their timestamped state set to the newest timestamped state amongst those seen by the peers in their connected component. (BUT: see Proviso below)

Indeed, assume inductively that this is true of all components, and suppose two peers A and B become connected. At least one of the two will send a Peer Query packet to the other; say A sends one to B. So B sends a State Response back to A. If B's component has newer timestamped state than A's, then A will send a State message with the newer one, and so A's component will update. If A's component has newer timestamped state, A will send a State Response to B, and then as above B's component will update.

Proviso: the above is severely oversimplified, because updates are not in fact atomic operations, so one has to consider the dynamics of propagation of the messages and merges and splits and further updates during the propagation. I believe it does work in general, but it's not so easy to reason about precisely.

Backwards compatibility

It's true that if A or B in the above are using an old client, the synchronisation will fail. There are things we could add to try to deal with this, but it would be messy. But actually: since the only problematic versions of toxcore would be those since the (fairly recent) introduction of persistent groups, perhaps it isn't too bad to just ignore the problem?

To handle the more general problem of peers using the legacy unstamped mechanism to set titles, we could do the following: if we receive a Title message with a title which isn't that in our current state, update our state with the title and increment the timestamp. So the peers using the new protocol will agree on a timestamped state containing the new title. When we set a new title, send a Title message after the State message (some peers may receive them in the other order, but that's ok).

When we're first introduced to a group, if the introducer is using the old protocol so sends us just a title but no timestamped state, put the title in the state but leave the timestamp at 0.

Meta

Although I'm perverse enough to have found writing the above a fairly amusing way to spend an evening, we should really ask ourselves whether it makes sense to keep working on this implementation of groups. We may end up replacing them with entirely with NGC.

Are these synchronisation problems already solved in the private groups in NGC? The description in the spec doesn't seem to discuss it at all.

Same question for the public groups, actually -- according to the description, it seems to just be assuming that periodically checking on the DHT will be enough to prevent netsplits, which sounds optimistic?