cachapa / crdt

Dart implementation of Conflict-free Replicated Data Types (CRDTs)
https://pub.dev/packages/crdt
Apache License 2.0
159 stars 20 forks source link

Question about HLC Implementation #6

Closed misha closed 2 years ago

misha commented 2 years ago

Hi! Thanks for this library. I've got a question about this library's Hlc class.

From what I understand about HLCs, they can deal with the pathological case where one node sets the system time to point in the future. I was reading this article where the author states the following about the use of HLC in a CRDT:

[A HLC] does however make the following guarantees:

  1. All events created on a single machine will be correctly ordered with respect to each other (even if the wall clock jumps back a second or is otherwise squirrely).

  2. Once machine A sends events to machine B, all events subsequently created on machine B will be ordered as after those events from machine A. So if A sets their clock one day ahead, makes some changes, and sends them to B, B will still be able to make changes even though its own wall clock is 'behind'.

Using this library, I wrote the following code:

import 'package:crdt/crdt.dart';

void main(List<String> arguments) {
  final local = MapCrdt('local');
  local.put('name', 'Misha');

  final tomorrow = DateTime.now().add(Duration(days: 1));
  final hlc = Hlc.fromDate(tomorrow, 'remote');
  final futureRemoteJson = '{"a":{"hlc":"$hlc","name": "Daniel"}}';

  local.mergeJson(futureRemoteJson);
}

Which yields:

Unhandled exception:
Clock drift of 86399977 ms exceeds maximum (60000)
#0      new Hlc.recv (package:crdt/src/hlc.dart:95:7)

Upon closer inspection, the code in Hlc.recv does not appear to match the one in the blog post mentioned above, and thus cannot deal with the pathological case of a future clock. Is this an implementation error?


Please let me know if I misunderstood CRDTs, HLCs, or this library in any way. This is brand new territory for me.

cachapa commented 2 years ago

The issue with allowing nodes with unbounded future clock drifts is that one bad node with a date that's years in the future can effectively disable the "wall clock" part of the HLC, turning it into a simple logical clock.

Consider node A, B and C. Node A and B have correct times and are within a couple of seconds drift between each other. Node C, however comes in 1 year in the future, performs one transaction and is never seen again.

If you allow Node C to participate even once, you now have set the HLC to December 2022 causing A and B to start incrementing the logical counter for each change for an entire year. This can become a problem since the HLC only reserves 4 bytes for the logical counter which can easily overflow depending on your system's size and level of activity.

You bring up a interesting point though: I arbitrarily picked a max drift of 1 minute under the assumption that in the modern world every node will be synched to NTP. I'm now considering whether that value shouldn't be revised, or made configurable.

misha commented 2 years ago

Unfortunately, the use case I'm considering would use a CRDT as an offline-available database, so synchronizing to NTP would not be feasible.

You are absolutely correct that an excessive, future clock drift defeats the purpose of the HLC. This made me question how the application data would react, especially if it stores dates. The more realistic approach is likely as you suggestion, preventing manipulation of the data in the presence of clock drift to begin with.

Thanks again for your comments - since this library's implementation was intended I'll go ahead and close my issue.

By the way, for my application I just needed the clock on its own so I published an implementation to Pub here.

cachapa commented 2 years ago

I thought about publishing the HLC implementation as a separate packaged as well but decided against it since the implementation details will be particular to the CRDT it's used in.

Still, there are some existing implementations that I tried to follow in case my system had to interoperate with others in the future (e.g. https://github.com/jlongster/crdt-example-app/blob/master/shared/timestamp.js). I noticed you're using : as a separator between the wall clock, logical counter, and node id. The implementations I saw while implementing my solution all used - which makes sense since the wall clock already uses :. Using - makes it easier to break the HLC-string into parts.

An aside: did you base your code on my implementation? If so it would be nice to get a mention :-)