automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.76k stars 467 forks source link

Lossless date/time representation #357

Open lightsprint09 opened 3 years ago

lightsprint09 commented 3 years ago

At the moment automerge stores dates as milliseconds(int64) which can lead to loss of sub-milliseconds precision on platforms which support it. We should find another representation which works on all platforms and does not came with a loss in resolution.

ept commented 3 years ago

Swift represents datetime values as a double-precision floating point number containing the number of seconds and fractional seconds. I'm not sure which epoch Swift uses — the API seems to use both the Unix epoch of 1 January 1970, and also 1 January 2001. @lightsprint09 do you know which epoch is used internally?

Note that floating-point being what it is, I don't think you can losslessly convert between the two epoch dates. That is, if you take a floating-point date, add the number of seconds between the two epoch dates, and then subtract the number of seconds between the two epoch dates again, you don't necessarily get back the number that you started with.

I would be more inclined to stick with an integer representation, since it has fewer such rounding issues, and it can be encoded more compactly. But I would be happy to consider bumping the resolution up to microseconds if there is a desire for higher resolution.

ept commented 3 years ago

I would also love to hear from users of other programming languages about how their language represents datetimes. It would be ideal if we could find a single representation that can losslessly represent datetime values in all common languages in which we are likely to have frontends now or in the future, but I am not sure that is achievable.

alexjg commented 3 years ago

I would be upset if I had to deal with floating point dates. Given that we're using an int64 is there any reason not to bump up to nanosecond precision? There are quite a few platforms and programming languages which produce times with 1ns precision and losing that precision on the round trip through automerge would lead to irritating "this date which I just stored in automerge is not equal to the same date before I put it in automerge" style bugs.

ept commented 3 years ago

int64 with nanosecond resolution would limit the range to the epoch date +/- 292 years, which seems a bit low, whereas microsecond resolution gives us +/- 292 millennia. We could expand beyond 64 bits, but that's a bit of a can of worms…

jeffa5 commented 3 years ago

As a middle ground could we store microseconds?

This should give us enough years and quite a bit of precision. Datacenter networks can be in microseconds and I think some operations on the Rust backend are taking microseconds (if we wanted to measure automerge with automerge).

josephg commented 3 years ago

I'm going to suggest automerge goes the other way. I think automerge should only ever support times at a second resolution. (Or maybe 10 second resolution). My reasons are:

  1. High precision per-character timestamps would add a lot of overhead in the case of text editing
  2. Typing timestamps can be used to uniquely fingerprint a user based on their characteristic typing behaviour. I don't want this information to be collected and available by default.
  3. I can't think of any valid use case for collecting higher precision editing events, and nobody has suggested any.
jeffa5 commented 3 years ago

Ah, I was considering the timestamp type here. Since this is a user value I was expecting them to choose to just truncate the precision they want. If this is for internal tracking of events then I suppose microsecond is unnecessary.

alexjg commented 3 years ago

I also understood this as being to do with the Date type in the automerge data model. For change timestamps I have the same concerns as @josephg but I think that might be for another issue?

ept commented 3 years ago

Yes, this is for Date objects that are an application-assigned part of an Automerge document. The automatically added timestamp on changes already have second-level granularity for all the reasons @josephg mentioned.

For Date objects that are part of a document I am also not sure which applications need finer-than-millisecond timestamps; I think @lightsprint09's proposal was mainly motivated by wanting to have lossless round-trips between application code and Automerge's serialisation.

jeffa5 commented 3 years ago

For use cases I'd potentially have one for example storing time measurements of documents in automerge. This means they could be network latencies or internal computation operations, likely taking less than 1ms and I think micro second would be good here.

Nanosecond may be useful to an extreme in this case but I think that pushes the balance to not enough years.