Consistency and completeness guarantees of derived models

enikao commented 7 months ago

To what degree do we guarantee consistency and completeness of derived models?

Option A: Global consistency and completeness for all derivations

If a client asks for some derived models, the repository must guarantee all derivations are complete and consistent for the whole repository at this point in time. This means that all known derivations have been calculated for the whole repository, and no outdated derivations are present anywhere in the repository.

Pro:

Very thorough and reliable

Con:

Extremely expensive
Unpredictable: Might take forever
Needs API to ask processors: "Are you done yet?"

Option B: Global consistency and completeness for requested derivations

If a client asks for some derived models, the repository must guarantee the requested derivations are complete and consistent for the whole repository at this point in time. This means that all requested derivations have been calculated for the whole repository, and no outdated requested derivations are present anywhere in the repository.

Pro:

Thorough and reliable

Con:

Extremely expensive
Unpredictable: Might take forever
Needs API to ask processors: "Are you done yet?"

Option C: Consistency and completeness for requested derivations for requested base nodes

A client asks for some derived models in the context of some base nodes. The repository must guarantee the requested derivations are complete and consistent for the requested base nodes at this point in time. This means that all requested derivations have been calculated for the requested base nodes, and no outdated requested derivations are present for any of the requested base nodes.

Pro:

Compromise between reliability and cost

Con:

Unpredictable: Might take forever
Needs API to ask processors: "Are you done yet?"

Option D: Completeness for requested derivations for requested base nodes

Same as C, but might return outdated derivations (either a derivation for an already deleted base node, or a derivation with wrong content).

Pro:

Doesn't need to wait for derivation updates

Con:

Needs API to ask processors: "Are you done with all new derivations?"
Seems not much cheaper than C
How useful to a client?

Option E: Consistency for requested derivations for requested base nodes

Same as C, but might miss some not-yet-processed derivations.

Pro:

Doesn't need to wait for new derivations

Con:

Needs API to ask processors: "Are you done with all derivation updates?"
Seems a bit cheaper than C, but complicates processor implementation (they'd need to separate update and creation)
How useful to a client?

Option F: Processors need to provide updating / unavailable derivations

A processor needs to quickly return an updating derivation for all base nodes it is still processing, and an unavailable derivation for all nodes it does not want to provide a derivation for. Example: for a Java AST, the typesystem processor might return updating for the InferredVarDeclaration node, and unavailable for the ForLoopStatement. Then the client knows it needs to ask again a bit later to get the type of the InferredVarDeclaration, and will never get a type for the ForLoopStatement.

updated might include an estimation when the derivation is available.

Pro:

Enables quick replies
Keeps the client informed

Con:

Huge derivations -- at least one per base model node
Probably complicates implementation of processors

Option G: Processors need to provide unavailable derivations

Same as F, but no updating derivations.

Pro:

Enables quick replies
Client knows where it does not have to wait for a derivation
Probably simpler to implement for the processor than F

Con:

Huge derivations -- at least one per base model node
How useful to the client? Still doesn't know if the provided derivations are up-to-date

Option H: Internal consistency for requested derivations for requested base nodes

Repository returns both the base nodes and the requested derivations it knows about, no matter how current. Does not contain derivations for deleted base nodes.

Pro:

One-stop-shop for both base nodes and relevant derivations
Seems little effort to implement on repository side

Con:

Bigger replies

Option I: No consistency or completeness guarantee

The client might get outdated derivations, derivations for deleted base nodes, or might miss derivations not yet available.

Pro:

Very simple

Con:

Invites polling
How can a client ever know it gets up-to-date, complete derivations?

dslmeinte commented 7 months ago

My thoughts (as promised, jotted down semi-randomly):

Consistency and completeness are separate notions/concepts, even though they're conceptually quite close together.
Completeness is very hard to do without some explicit effort by language implementors. There's a difference between a node being intrinsically untypable (as in: no instance of the concept can be meaningfully assigned a type), incidentally untypable (as: no meaningful type could be assigned to this particular instance – in this case the type assigned could be an instance of some sort of IHappensToNotBeTypable classifier), or not typed yet (as in: the type calculator didn't finish computing a type for this node – which is assumed to inherit from ITyped/ITypable). The language implementor would ultimately have to provide this distinction.
We might want to relax consistency to the point that we're just stating “original model X has had its last change @ and derived model Y has had its last change @ and t2 lies after t1, so probably Y is consistent with X”.

joswarmer commented 7 months ago

Aren't we missing an option?

Option J: Processors need to provide updating derivations

A processor needs to quickly return an updating derivation for all base nodes it is still processing. Example: for a Java AST, the typesystem processor might return updating for the InferredVarDeclaration node. Then the client knows it needs to ask again a bit later to get the type of the InferredVarDeclaration.

updated might include an estimation when the derivation is available.

Pro:

Enables quick replies
Keeps the client informed

Con:

Probably complicates implementation of processors

enikao commented 7 months ago

We might want to relax consistency to the point that we're just stating “original model X has had its last change @ and derived model Y has had its last change @ and t2 lies after t1, so probably Y is consistent with X”.

Then we'd introduce a notion of time, which is very tricky (you know Hickley's efforts on this). If we say something about consistency, I'd strongly prefer a simpler statement like "consistent with (some defined other part of the repository) at the time of request".

enikao commented 7 months ago

Option J: Processors need to provide updating derivations

Sounds to me very similar to Option F. The difference would be that any node that has no derivation is implicitly unavailable. Does this match your understanding?

dslmeinte commented 7 months ago

That statement definitely is simpler, but I'm not sure it'll be anywhere near as easy to implement :D (Also channeling my inner Hickey ;))

enikao commented 7 months ago

I think it's a lot easier to implement.

Bulk-based processors

Request the base model via bulk, so I get its current state
Calculate my derivation
Return derivation consistent with the requested base model state

Delta-based processors

Process the queue with not-yet-handled incoming deltas
Calculate my derivation
Return derivation

joswarmer commented 7 months ago

Option J: Processors need to provide updating derivations

Sounds to me very similar to Option F. The difference would be that any node that has no derivation is implicitly unavailable. Does this match your understanding?

It does.

enikao commented 6 months ago

I'll try to spell out my idea of "completeness" could be implemented.

"complete" means "a processor has finished all its work up to the point in time at which we asked it"

Scenario

Assumptions:

processors use the same APIs as other clients to retrieve their base models (and any other nodes they might need).
There is a central authority derivationBackend within a repository that handles requests for derivations.
derivationBackend has a special API to ask processors for a complete derivation. (Note that this only contains the derived nodes from this processor -- derivationBackend aggregates the results of all processors that contribute to the same derivation).

change1, change2, and change3 change the original model.
A generator wants to make sure there are no errors in the model. It asks derivationBackend for the complete validation derivation.
Three processors contribute to this derivation: ScopeProcessor, ValidationProcessor and DomainValidator. derivationBackend asks all of them for their complete contribution.
ScopeProcessor uses delta protocol. It has the 3 unprocessed deltas in its input queue.
ScopeProcessor updates its internal calculations with change1 and change2 input deltas.
Another change newChange happens on the original model.
ScopeProcessor updates its internal calculations with change3. It does not handle newChange.
ScopeProcessor returns the scope-based validation to derivationBackend.
ScopeProcessor handles newChange.
At the same time as 4., ValidationProcessor uses bulk protocol to request the original model.
Repository replies with the original model. This might happen before or after newChange (we don't really know).
ValidationProcessor does its work and returns validation to derivationBackend.
At the same time as 4., DomainValidator uses delta protocol. It has the 3 unprocessed deltas in its input queue.
DomainValidator processes change1, does not need to do anything.
DomainValidator processes change2 and updates the persisted derived model in the repository.
Now newChange happens to DomainValidator.
DomainValidator processes change3, does not need to do anything. It does not handle newChange.
DomainValidator returns validation to derivationBackend.
DomainValidator processes newChange, does not need to do anything.
derivationBackend aggregates all validation results and returns them to the generator.

As we can see in step 11, we don't have a global consistency guarantee. But I think this would be very hard to achieve without a modelix-like backend. Of course if the repository has such a sophisticated storage, derivationBackend can ask all processors with the appropriate context, so all of them work on the same state of the original model.

Why a global state identifier would not help

Assume at step 2, the global state is aa. The generator asks for completeness on that state.
In step 6 and 16, the global state updates to bb via newChange. The generator does not care.
In step 15, DomainValidator changes the repository to global state ab. The generator wants to include that.
In step 20, derivationBackend needs to correlate the requested state aa and the current state bb.

This correlation can only make any sense if the global state MUST be strictly monotonic. Even then, what would it reply? If it is really smart and can figure out that ab is relevant to the generator, but I doubt that. More likely, it tells the generator "sorry, outdated, please refresh yourself and ask again". Then the generator has to poll until it gets a consistent state -- not very desirable.

enikao commented 6 months ago

The same scenario as sequence diagram.

Red activity lines for repository are updates. Yellow boxes are the incoming, unprocessed changes.

source

scenario

LionWeb-io / specification