Can Repositories have stricter requirements on node IDs than LIonWeb (e.g. only longs)?

markusvoelter commented 1 year ago

As part of Lionweb, we have not said anything about how node IDs are structured, beyond the Base64 decision. However, specific repositories might make assumptions about this ID. For example, modelix assumes longs. This means:

if we receive data from the server that has nodes in it, their IDs are assumed by the client to be valid for the repo
if the client creates new nodes, they have to use IDs that are valid for the repo. Best way of doing is is for the client to request a pool of new IDs from the server for each editing session (we need an API for that)

One consequence of this approach is that nodes and their IDs might not be portable across repos. However, when we import a bunch of nodes into a different repo, we will likely have to touch the IDs anyway because they might overlap with existing ones. So the additional burden of changing the format (arbitrary string to, say, long) is not a huge additional burden.

enikao commented 1 year ago

I think that's exactly the point of specifying what is a valid id: every LionWeb-compatible repository must accept such ids, and supply only such ids. The repository is free to use anything else internally (or via other interfaces).

joswarmer commented 1 year ago

I tend to agree with Niko.

ftomassetti commented 1 year ago

Avoiding the translation of IDs would permit moving nodes from repo A and repo B and then back to repo A without breaking references.

enikao commented 1 year ago

I think Modelix already handles this for MPS nodes by storing the MPS id internally explicitly (but not delivering that node property back to MPS).

enikao commented 1 year ago

Decisions on 2023-03-03:

A client MUST be able to handle any node id adhering to LIonWeb spec (#31).
A client MUST request free ids from the repository if the client wants to create new nodes, and only use these ids.
A server CAN limit the ids to a subset of valid LIonWeb ids.
If a client wants to exchange/import/export nodes with another source, the client MUST assure all node ids adhere to the rules above. One way to achieve this would be to store an additional property in the repository's node that contains the external id. In a future version, there might be a mapping API (#94).

Rationale: The main issue with limited node ids is exchange/import of external node ids. It would be convenient for clients to rely on the full LIonWeb spec range for valid ids, as the client could come up with its own mapping, and easily re-use external ids. However, the repository must guarantee unique ids; then potentially every external id could form a duplicate and thus would need to change upon import. This means that either the client or the repository must keep some mapping around in any case.

A client only needs to handle mapping if the client is concerned with external ids. A repository might chose internal structures or optimizations based on a limited id range. All these optimizations would need to be compatible with full id range, even if no client were interested.

Thus, it seems easier for clients to deal with the additional effort in case they need it, than for repositories in any case.

enikao commented 1 year ago

@ftomassetti Please check your use case is covered.

ftomassetti commented 1 year ago

I understood this differently: A client MUST request free ids from the repository if the client wants to create new nodes, and only use these ids.

I understand that a client can send any valid node id it creates and then the repository may decide to remap those node ids, communicating back the mapping done (e.g., "I translated foo1 into 123 and foo2 into 456")

dslmeinte commented 1 year ago

I also prefer the client being able to come up with (syntactically-valid) IDs itself, and “correcting” those IDs from a server response once it's been able to send ∂s (or a whole model). You can't rely on a client (such as an editor) being online and with so low latency. So, you don't want to have to roundtrip when a user initiates creating a new instance. (You could get around this with pre-fetching a set of valid IDs, but we anyway have to do the remapping dance.)

enikao commented 1 year ago

The idea about "getting node ids" is mentioned in #25, (and we're looking into details in context of C project at https://docs.google.com/document/d/1SwfOkt_UGLNL-3tFQMi0WnzlhpxNc7eJsBOKTfwcAno/edit?skip_itp2_check=true# ).

A client can request a range of free ids and use them. If a client would be offline for a long time, it would request a sufficient range once it is back online and do the remapping. I think it's simpler if remapping happens exclusively with one participant, and never has to be communicated.

dslmeinte commented 1 year ago

The problem is that a client can't know it's going to be offline for some extended period, and I really wouldn't want to push an "always online" deal on users. The remapping has to happen in any case at the client, and only there: the server just communicates back (once) to the client "this ID is now ID'".

dslmeinte commented 1 year ago

An additional problem with requesting free IDs beforehand is that it locks up IDs, and the server has to track which are given out. Not a huge deal, but another disadvantage of this approach, IMHO.

enikao commented 1 year ago

The client does not need to request enough ids in advance. It can use temporary internal ids, and request enough for remapping once it reconnects.

ftomassetti commented 1 year ago

To me it seems that the repository is exposing to the client its own limitation/internal optimization, causing a not insignificant complication for clients. In general it feels wrong to me, I then understand the argument that, if we need to support mapping anyway, this is a limitation that has less practical impact and it remains mostly a conceptual issue.

I was thinking that without this limitation we may find a way to avoid the remapping system. For example, if the repository was accepting strings of arbitrary length (with some limitations on the set of characters used, as we discussed), the client could perhaps just reserve a certain prefix and in that way it could be able to assign all IDs it needs (provided they start with that prefix), without the risk of any conflict and without the need of translating the IDs for every client.

In this scenario when a client try to get IDs that are not available its request is rejected and it is responsibility of the client to come up with some prefix, reserve it, and just use that prefix in front of all Node IDs it tried before, this time with the certainty to get its request accepted.

The repository could still have some internal mapping to more limited IDs (like longs), in case it is necessary to support the implementation choices done in the repository, but the mapping complexity would be eliminated and the system would be simpler, in my opinion.

enikao commented 1 year ago

Considerations:

Importing models always needs id mapping: if the repo already contains a (semantically different) node with the imported id, the repo must map the imported id
What are performance / complexity implications for implementers?
- Repo might need two internal lookup indices: One for internal ids, one for LIonWeb ids.
If we establish limited supported ids (aka stricter requirements), it always influences the API:
- We either need a repository API like getNewId(), to retrieve valid ids from the repo.
- Or a client can send arbitrary ids, and the repo communicates back if the ids were mapped in some way. Then the protocol needs to support this communication.
There is always one party that has to do the extra work: Either the client has to take care of using only valid ids, or the repo has to map external LIonWeb ids to its internal representation.
Repository is a persistent storage: Should be easier to store mapping between ids
Probable (most important?) first use case: Compatible systems communicate → id-compatible “by default” (e.g. Modelix with Modelix or MPS with MPS)

enikao commented 1 year ago

Decision for now for the sake of progress: Every LIonWeb participant has to accept every valid id

Rationale: We don't have enough experience. This is the easiest way forward for now, and doesn't limit future options too much.

→ No mapping on protocol level required

LionWeb-io / specification

Can Repositories have stricter requirements on node IDs than LIonWeb (e.g. only longs)? #70