Memento protocol - Githubissues

hvdsomp commented 11 months ago

When it comes to resource versioning, it seems appropriate that braid would reference RFC7089: HTTP Framework for Time-Based Access to Resource States -- Memento.

toomim commented 11 months ago

Yeah, that would be a good idea, thanks!

I've looked into Memento a bit, and the first big blocker is that it represents time using a computer's local time. Unfortunately, that doesn't work for synchronization over a distributed network, because time becomes relative, and you can't rely on clocks. Instead, you need to construct a partial order of events.

I other words, Memento models time like:

  o  - 4:38pm June 4rd 2023
  |
  o - 11:30am July 2nd 2023
  |
  o - 2:02pm August 1 2023
  |
  v

But for distributed sync, we need something like:

        o - x35
       / \
h3u - o   o - 7bx
       \ /
        o - 73h

In the long run, though, we'd like to be able to support all the features we need for versioning and time in HTTP. For instance, we want to support local times as time identifiers too, so this works in Braid:

        o - 11:30am July 1 2023
       / \
h3u - o   o - 8:32pm July 9 2023
       \ /
        o - 10:55am August 4 2023

There might be useful features in Memento for history representations that we don't have in Braid, and should consider merging in. What do you think?

toomim commented 11 months ago

^ Updated my comment above

hvdsomp commented 11 months ago

@toomim I do understand the difference in perspective. Before I share some ideas re how Memento could still be used in relation to braid, I have a question: While version identifiers are crucial in braid, is it safe to assume that the datetimes of these versions are available too?

hvdsomp commented 2 months ago

I had a look at the HTTP Resource Versioning I-D and was wondering whether the combination of two existing headers - memento-datetime and eTag - could be re-purposed to identify versions. The I-D introduces the Version header (in responses) for this purpose.

To request a version, the combination of the existing accept-datetime and a to-be-defined accept-eTag might be an alternative to the Version header (in requests).

An interesting result of this approach would be the ability to issue requests with only accept-datetime or only accept-eTag to systems that support both. Such requests could result in HTTP 300 Multiple Choices responses with, respectively:

a choice of versions with the same memento-datetime yet different eTags
a choice of versions with the same eTag yet different memento-datetime

I see no alternative in existing approaches to the proposed Parent header.

As a side note regarding the proposed Version header: it feels like there should be an Accept-Version for requests and a Version header for responses.

toomim commented 2 months ago

To your old question:

While version identifiers are crucial in braid, is it safe to assume that the datetimes of these versions are available too?

I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.

As for your idea of combining etag with memento-datetime, I don't see how you can combine those to get a version.

The issue with etag is that it identifies unique contents, not time. The issue with memento-datetime is that it supports only linear (clock) time. Are you suggesting that we could write a function version(etag, datetime) that gives us the same functionality as a version, supporting a DAG of time? I'm not sure how to write that function.

As a side note regarding the proposed Version header: it feels like there should be an Accept-Version for requests and a Version header for responses.

Yeah, this is a common feeling, but I haven't seen a good reason for it, and it sure seems a lot simpler and more sensible to use the same header. I'd love to hear a good reason for it.

One issue with Accept-* is that it typically specifies a ranked preference list of things that it accepts. But in this case, we are requesting a specific version. It makes sense to say "GET Version X" rather than "GET, and I accept versions X, Y, Z" when we really just want to GET X.

hvdsomp commented 1 month ago

I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.

But they would generate version-specific identifiers?

As for your idea of combining etag with memento-datetime, I don't see how you can combine those to get a version.

The issue with etag is that it identifies unique contents, not time. The issue with memento-datetime is that it supports only linear (clock) time. Are you suggesting that we could write a function version(etag, datetime) that gives us the same functionality as a version, supporting a DAG of time? I'm not sure how to write that function.

I am probably missing something about the protocol. But it seemed to me that the and of etag and memento-datetime would uniquely identify a version as you intend?

As a side note regarding the proposed Version header: it feels like there should be an Accept-Version for requests and a Version header for responses.

Yeah, this is a common feeling, but I haven't seen a good reason for it, and it sure seems a lot simpler and more sensible to use the same header. I'd love to hear a good reason for it.

Well, if you indicate it is a common feeling then I don't think an extra reason needs to be given ;-)

toomim commented 1 month ago

I don't think this is safe to assume. I'm aware of at least a few collaborative text editors that don't store datetime timestamps for the edits.

But they would generate version-specific identifiers?

Yes, often IDs like "hvdsomp-113". And then if a single user (like hvdsomp) types 30 characters in a row, all of the version IDs can be compressed down to a single run, like "hvdsomp-[113-213]". Storing wallclock timestamps adds a lot of data that cannot be compressed so easily.

I am probably missing something about the protocol. But it seemed to me that the "and" of etag and memento-datetime would uniquely identify a version as you intend?

Ah, I see what you're thinking. But another issue with datetimes is they have limited resolution.

If you flip a version back and forth within the resolution of a second, then Thu, 31 May 2007 20:35:00 GMT won't be able to distinguish them.

hvdsomp commented 1 month ago

I am probably missing something about the protocol. But it seemed to me that the "and" of etag and memento-datetime would uniquely identify a version as you intend?

Ah, I see what you're thinking. But another issue with datetimes is they have limited resolution.

If you flip a version back and forth within the resolution of a second, then Thu, 31 May 2007 20:35:00 GMT won't be able to distinguish them.

Indeed, that has been a theoretical concern with the Memento protocol too. In practice it has not been an issue. And, if one would run into a conflict, there's always the 300 Multiple Choices fallback.

toomim commented 1 month ago

Collaborative editors typically create a version per keystroke. A fast typist can generate 8 characters per second. We also want to allow growth into future use-cases that might want faster-than-human data-updates, like for distributed computation.

There are also use-cases where you want history but don't know when things happened, like when importing history from a database that doesn't store full timestamps. Then you have to lie, in order to use memento, and pretend that there was datetime accurate to the second.

I'm also not sure how it would work to synchronize an earth computer with a mars computer. They have different relative spacetimes, and if one is moving faster than the other, then time should slow down for it, meaning that their clocks will go at different rates, and if one is behind ... it might end up in front after a while.

hvdsomp commented 1 month ago

I'm also not sure how it would work to synchronize an earth computer with a mars computer. They have different relative spacetimes, and if one is moving faster than the other, then time should slow down for it, meaning that their clocks will go at different rates, and if one is behind ... it might end up in front after a while.

I see what you did there! You pulled an Interstellar on me. I’m afraid I have to rest my case now. But, given there’s so many arguments/motivations for not using datetime, maybe the Internet Draft could have a bit more material with that regard?

toomim commented 1 month ago

Sure, the interstellar issue is pretty far-out. Perhaps this response will be more satisfying.

The deep issue is that time in a distributed system is a partial order. You need to represent the partial order.

You can't get that from datetimes + etags. Datetimes represent only a linear order. Etags don't represent any order. If we combine Datetimes with Etags, we still have no way to represent a partial order.

Let's say we have 3 versions: a, b, and c, with the following datetimes and etags:

a "5:00:00", "x2had8"
b "5:00:01", "773j83"
c "5:00:01", "x2had8"

How do you know which version came first? Which version was inherited from another version?

You can't tell with the datetimes, because it's possible that these three versions were all generated on different computers, with different clock skews, and they can't distinguish the two events that occurred within a second anyway.

You can't tell with etags, because etags don't have any ordering information.

All you know is that version a and c have the same contents, because they have the same etag. But you don't know what sequence of edits happened between those versions.

What you want to know is the partial order. The straightforward way to represent that is with a DAG:

  a
 / \
b   c

I think you're totally right that we could improve the draft's discussion of related work— my main thought is to elaborate more on memento. What do you think?

hvdsomp commented 1 month ago

I think it’s very nice to mention Memento. But, based on everything you’ve shared in this issue, you are able to make a much more general point about why datetime is not an appropriate aspect of version indication for the cases you want to support. Thanks for a very interesting interaction!

braid-org / braid-spec

Memento protocol #133