Closed hvdsomp closed 1 month ago
Yeah, that would be a good idea, thanks!
I've looked into Memento a bit, and the first big blocker is that it represents time using a computer's local time. Unfortunately, that doesn't work for synchronization over a distributed network, because time becomes relative, and you can't rely on clocks. Instead, you need to construct a partial order of events.
I other words, Memento models time like:
o - 4:38pm June 4rd 2023
|
o - 11:30am July 2nd 2023
|
o - 2:02pm August 1 2023
|
v
But for distributed sync, we need something like:
o - x35
/ \
h3u - o o - 7bx
\ /
o - 73h
In the long run, though, we'd like to be able to support all the features we need for versioning and time in HTTP. For instance, we want to support local times as time identifiers too, so this works in Braid:
o - 11:30am July 1 2023
/ \
h3u - o o - 8:32pm July 9 2023
\ /
o - 10:55am August 4 2023
There might be useful features in Memento for history representations that we don't have in Braid, and should consider merging in. What do you think?
^ Updated my comment above
@toomim I do understand the difference in perspective. Before I share some ideas re how Memento could still be used in relation to braid, I have a question: While version identifiers are crucial in braid, is it safe to assume that the datetimes of these versions are available too?
I had a look at the HTTP Resource Versioning I-D and was wondering whether the combination of two existing headers - memento-datetime
and eTag
- could be re-purposed to identify versions. The I-D introduces the Version
header (in responses) for this purpose.
To request a version, the combination of the existing accept-datetime
and a to-be-defined accept-eTag
might be an alternative to the Version
header (in requests).
An interesting result of this approach would be the ability to issue requests with only accept-datetime
or only accept-eTag
to systems that support both. Such requests could result in HTTP 300 Multiple Choices
responses with, respectively:
I see no alternative in existing approaches to the proposed Parent
header.
As a side note regarding the proposed Version
header: it feels like there should be an Accept-Version
for requests and a Version
header for responses.
To your old question:
While version identifiers are crucial in braid, is it safe to assume that the datetimes of these versions are available too?
I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.
As for your idea of combining etag
with memento-datetime
, I don't see how you can combine those to get a version
.
The issue with etag
is that it identifies unique contents, not time. The issue with memento-datetime
is that it supports only linear (clock) time. Are you suggesting that we could write a function version(etag, datetime) that gives us the same functionality as a version, supporting a DAG of time? I'm not sure how to write that function.
As a side note regarding the proposed
Version
header: it feels like there should be anAccept-Version
for requests and aVersion
header for responses.
Yeah, this is a common feeling, but I haven't seen a good reason for it, and it sure seems a lot simpler and more sensible to use the same header. I'd love to hear a good reason for it.
One issue with Accept-*
is that it typically specifies a ranked preference list of things that it accepts. But in this case, we are requesting a specific version. It makes sense to say "GET Version X" rather than "GET, and I accept versions X, Y, Z" when we really just want to GET X.
I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.
But they would generate version-specific identifiers?
As for your idea of combining
etag
withmemento-datetime
, I don't see how you can combine those to get aversion
.The issue with
etag
is that it identifies unique contents, not time. The issue withmemento-datetime
is that it supports only linear (clock) time. Are you suggesting that we could write a function version(etag, datetime) that gives us the same functionality as a version, supporting a DAG of time? I'm not sure how to write that function.
I am probably missing something about the protocol. But it seemed to me that the and
of etag
and memento-datetime
would uniquely identify a version as you intend?
As a side note regarding the proposed
Version
header: it feels like there should be anAccept-Version
for requests and aVersion
header for responses.Yeah, this is a common feeling, but I haven't seen a good reason for it, and it sure seems a lot simpler and more sensible to use the same header. I'd love to hear a good reason for it.
Well, if you indicate it is a common feeling then I don't think an extra reason needs to be given ;-)
I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.
But they would generate version-specific identifiers?
Yes, often IDs like "hvdsomp-113". And then if a single user (like hvdsomp) types 30 characters in a row, all of the version IDs can be compressed down to a single run, like "hvdsomp-[113-213]". Storing wallclock timestamps adds a lot of data that cannot be compressed so easily.
I am probably missing something about the protocol. But it seemed to me that the "and" of
etag
andmemento-datetime
would uniquely identify a version as you intend?
Ah, I see what you're thinking. But another issue with datetimes is they have limited resolution.
If you flip a version back and forth within the resolution of a second, then Thu, 31 May 2007 20:35:00 GMT
won't be able to distinguish them.
I am probably missing something about the protocol. But it seemed to me that the "and" of
etag
andmemento-datetime
would uniquely identify a version as you intend?Ah, I see what you're thinking. But another issue with datetimes is they have limited resolution.
If you flip a version back and forth within the resolution of a second, then
Thu, 31 May 2007 20:35:00 GMT
won't be able to distinguish them.
Indeed, that has been a theoretical concern with the Memento protocol too. In practice it has not been an issue. And, if one would run into a conflict, there's always the 300 Multiple Choices
fallback.
Collaborative editors typically create a version per keystroke. A fast typist can generate 8 characters per second. We also want to allow growth into future use-cases that might want faster-than-human data-updates, like for distributed computation.
There are also use-cases where you want history but don't know when things happened, like when importing history from a database that doesn't store full timestamps. Then you have to lie, in order to use memento, and pretend that there was datetime accurate to the second.
I'm also not sure how it would work to synchronize an earth computer with a mars computer. They have different relative spacetimes, and if one is moving faster than the other, then time should slow down for it, meaning that their clocks will go at different rates, and if one is behind ... it might end up in front after a while.
I'm also not sure how it would work to synchronize an earth computer with a mars computer. They have different relative spacetimes, and if one is moving faster than the other, then time should slow down for it, meaning that their clocks will go at different rates, and if one is behind ... it might end up in front after a while.
I see what you did there! You pulled an Interstellar on me. I’m afraid I have to rest my case now. But, given there’s so many arguments/motivations for not using datetime, maybe the Internet Draft could have a bit more material with that regard?
Sure, the interstellar issue is pretty far-out. Perhaps this response will be more satisfying.
The deep issue is that time in a distributed system is a partial order. You need to represent the partial order.
You can't get that from datetimes + etags. Datetimes represent only a linear order. Etags don't represent any order. If we combine Datetimes with Etags, we still have no way to represent a partial order.
Let's say we have 3 versions: a
, b
, and c
, with the following datetimes and etags:
a
"5:00:00", "x2had8"b
"5:00:01", "773j83"c
"5:00:01", "x2had8"How do you know which version came first? Which version was inherited from another version?
You can't tell with the datetimes, because it's possible that these three versions were all generated on different computers, with different clock skews, and they can't distinguish the two events that occurred within a second anyway.
You can't tell with etags, because etags don't have any ordering information.
All you know is that version a
and c
have the same contents, because they have the same etag. But you don't know what sequence of edits happened between those versions.
What you want to know is the partial order. The straightforward way to represent that is with a DAG:
a
/ \
b c
I think you're totally right that we could improve the draft's discussion of related work— my main thought is to elaborate more on memento. What do you think?
I think it’s very nice to mention Memento. But, based on everything you’ve shared in this issue, you are able to make a much more general point about why datetime is not an appropriate aspect of version indication for the cases you want to support. Thanks for a very interesting interaction!
When it comes to resource versioning, it seems appropriate that braid would reference RFC7089: HTTP Framework for Time-Based Access to Resource States -- Memento.