LionWeb-io / specification

Specifications of the LionWeb initiative
http://lionweb.io/specification/
7 stars 0 forks source link

Can Repositories have stricter requirements on node IDs than LIonWeb (e.g. only longs)? #70

Closed markusvoelter closed 5 months ago

markusvoelter commented 1 year ago

As part of Lionweb, we have not said anything about how node IDs are structured, beyond the Base64 decision. However, specific repositories might make assumptions about this ID. For example, modelix assumes longs. This means:

One consequence of this approach is that nodes and their IDs might not be portable across repos. However, when we import a bunch of nodes into a different repo, we will likely have to touch the IDs anyway because they might overlap with existing ones. So the additional burden of changing the format (arbitrary string to, say, long) is not a huge additional burden.

enikao commented 1 year ago

I think that's exactly the point of specifying what is a valid id: every LionWeb-compatible repository must accept such ids, and supply only such ids. The repository is free to use anything else internally (or via other interfaces).

joswarmer commented 1 year ago

I tend to agree with Niko.

ftomassetti commented 1 year ago

Avoiding the translation of IDs would permit moving nodes from repo A and repo B and then back to repo A without breaking references.

enikao commented 1 year ago

I think Modelix already handles this for MPS nodes by storing the MPS id internally explicitly (but not delivering that node property back to MPS).

enikao commented 1 year ago

Decisions on 2023-03-03:

Rationale: The main issue with limited node ids is exchange/import of external node ids. It would be convenient for clients to rely on the full LIonWeb spec range for valid ids, as the client could come up with its own mapping, and easily re-use external ids. However, the repository must guarantee unique ids; then potentially every external id could form a duplicate and thus would need to change upon import. This means that either the client or the repository must keep some mapping around in any case.

A client only needs to handle mapping if the client is concerned with external ids. A repository might chose internal structures or optimizations based on a limited id range. All these optimizations would need to be compatible with full id range, even if no client were interested.

Thus, it seems easier for clients to deal with the additional effort in case they need it, than for repositories in any case.

enikao commented 1 year ago

@ftomassetti Please check your use case is covered.

ftomassetti commented 1 year ago

I understood this differently: A client MUST request free ids from the repository if the client wants to create new nodes, and only use these ids.

I understand that a client can send any valid node id it creates and then the repository may decide to remap those node ids, communicating back the mapping done (e.g., "I translated foo1 into 123 and foo2 into 456")

dslmeinte commented 1 year ago

I also prefer the client being able to come up with (syntactically-valid) IDs itself, and “correcting” those IDs from a server response once it's been able to send ∂s (or a whole model). You can't rely on a client (such as an editor) being online and with so low latency. So, you don't want to have to roundtrip when a user initiates creating a new instance. (You could get around this with pre-fetching a set of valid IDs, but we anyway have to do the remapping dance.)

enikao commented 1 year ago

The idea about "getting node ids" is mentioned in #25, (and we're looking into details in context of C project at https://docs.google.com/document/d/1SwfOkt_UGLNL-3tFQMi0WnzlhpxNc7eJsBOKTfwcAno/edit?skip_itp2_check=true# ).

A client can request a range of free ids and use them. If a client would be offline for a long time, it would request a sufficient range once it is back online and do the remapping. I think it's simpler if remapping happens exclusively with one participant, and never has to be communicated.

dslmeinte commented 1 year ago

The problem is that a client can't know it's going to be offline for some extended period, and I really wouldn't want to push an "always online" deal on users. The remapping has to happen in any case at the client, and only there: the server just communicates back (once) to the client "this ID is now ID'".

dslmeinte commented 1 year ago

An additional problem with requesting free IDs beforehand is that it locks up IDs, and the server has to track which are given out. Not a huge deal, but another disadvantage of this approach, IMHO.

enikao commented 1 year ago

The client does not need to request enough ids in advance. It can use temporary internal ids, and request enough for remapping once it reconnects.

ftomassetti commented 1 year ago

To me it seems that the repository is exposing to the client its own limitation/internal optimization, causing a not insignificant complication for clients. In general it feels wrong to me, I then understand the argument that, if we need to support mapping anyway, this is a limitation that has less practical impact and it remains mostly a conceptual issue.

I was thinking that without this limitation we may find a way to avoid the remapping system. For example, if the repository was accepting strings of arbitrary length (with some limitations on the set of characters used, as we discussed), the client could perhaps just reserve a certain prefix and in that way it could be able to assign all IDs it needs (provided they start with that prefix), without the risk of any conflict and without the need of translating the IDs for every client.

In this scenario when a client try to get IDs that are not available its request is rejected and it is responsibility of the client to come up with some prefix, reserve it, and just use that prefix in front of all Node IDs it tried before, this time with the certainty to get its request accepted.

The repository could still have some internal mapping to more limited IDs (like longs), in case it is necessary to support the implementation choices done in the repository, but the mapping complexity would be eliminated and the system would be simpler, in my opinion.

enikao commented 1 year ago

Considerations:

enikao commented 1 year ago

Decision for now for the sake of progress: Every LIonWeb participant has to accept every valid id

Rationale: We don't have enough experience. This is the easiest way forward for now, and doesn't limit future options too much.

→ No mapping on protocol level required