LionWeb-io / specification

Specifications of the LionWeb initiative
http://lionweb.io/specification/
6 stars 0 forks source link

Consistency and completeness guarantees of derived models #248

Open enikao opened 5 months ago

enikao commented 5 months ago

To what degree do we guarantee consistency and completeness of derived models?

Option A: Global consistency and completeness for all derivations

If a client asks for some derived models, the repository must guarantee all derivations are complete and consistent for the whole repository at this point in time. This means that all known derivations have been calculated for the whole repository, and no outdated derivations are present anywhere in the repository.

Pro:

Con:

Option B: Global consistency and completeness for requested derivations

If a client asks for some derived models, the repository must guarantee the requested derivations are complete and consistent for the whole repository at this point in time. This means that all requested derivations have been calculated for the whole repository, and no outdated requested derivations are present anywhere in the repository.

Pro:

Con:

Option C: Consistency and completeness for requested derivations for requested base nodes

A client asks for some derived models in the context of some base nodes. The repository must guarantee the requested derivations are complete and consistent for the requested base nodes at this point in time. This means that all requested derivations have been calculated for the requested base nodes, and no outdated requested derivations are present for any of the requested base nodes.

Pro:

Con:

Option D: Completeness for requested derivations for requested base nodes

Same as C, but might return outdated derivations (either a derivation for an already deleted base node, or a derivation with wrong content).

Pro:

Con:

Option E: Consistency for requested derivations for requested base nodes

Same as C, but might miss some not-yet-processed derivations.

Pro:

Con:

Option F: Processors need to provide updating / unavailable derivations

A processor needs to quickly return an updating derivation for all base nodes it is still processing, and an unavailable derivation for all nodes it does not want to provide a derivation for. Example: for a Java AST, the typesystem processor might return updating for the InferredVarDeclaration node, and unavailable for the ForLoopStatement. Then the client knows it needs to ask again a bit later to get the type of the InferredVarDeclaration, and will never get a type for the ForLoopStatement.

updated might include an estimation when the derivation is available.

Pro:

Con:

Option G: Processors need to provide unavailable derivations

Same as F, but no updating derivations.

Pro:

Con:

Option H: Internal consistency for requested derivations for requested base nodes

Repository returns both the base nodes and the requested derivations it knows about, no matter how current. Does not contain derivations for deleted base nodes.

Pro:

Con:

Option I: No consistency or completeness guarantee

The client might get outdated derivations, derivations for deleted base nodes, or might miss derivations not yet available.

Pro:

Con:

dslmeinte commented 4 months ago

My thoughts (as promised, jotted down semi-randomly):

joswarmer commented 4 months ago

Aren't we missing an option?

Option J: Processors need to provide updating derivations

A processor needs to quickly return an updating derivation for all base nodes it is still processing. Example: for a Java AST, the typesystem processor might return updating for the InferredVarDeclaration node. Then the client knows it needs to ask again a bit later to get the type of the InferredVarDeclaration.

updated might include an estimation when the derivation is available.

Pro:

Con:

enikao commented 4 months ago

We might want to relax consistency to the point that we're just stating “original model X has had its last change @ and derived model Y has had its last change @ and t2 lies after t1, so probably Y is consistent with X”.

Then we'd introduce a notion of time, which is very tricky (you know Hickley's efforts on this). If we say something about consistency, I'd strongly prefer a simpler statement like "consistent with (some defined other part of the repository) at the time of request".

enikao commented 4 months ago

Option J: Processors need to provide updating derivations

Sounds to me very similar to Option F. The difference would be that any node that has no derivation is implicitly unavailable. Does this match your understanding?

dslmeinte commented 4 months ago

That statement definitely is simpler, but I'm not sure it'll be anywhere near as easy to implement :D (Also channeling my inner Hickey ;))

enikao commented 4 months ago

I think it's a lot easier to implement.

Bulk-based processors

  1. Request the base model via bulk, so I get its current state
  2. Calculate my derivation
  3. Return derivation consistent with the requested base model state

Delta-based processors

  1. Process the queue with not-yet-handled incoming deltas
  2. Calculate my derivation
  3. Return derivation
joswarmer commented 4 months ago

Option J: Processors need to provide updating derivations

Sounds to me very similar to Option F. The difference would be that any node that has no derivation is implicitly unavailable. Does this match your understanding?

It does.

enikao commented 4 months ago

I'll try to spell out my idea of "completeness" could be implemented.

"complete" means "a processor has finished all its work up to the point in time at which we asked it"

Scenario

Assumptions:

  1. change1, change2, and change3 change the original model.
  2. A generator wants to make sure there are no errors in the model. It asks derivationBackend for the complete validation derivation.
  3. Three processors contribute to this derivation: ScopeProcessor, ValidationProcessor and DomainValidator. derivationBackend asks all of them for their complete contribution.
  4. ScopeProcessor uses delta protocol. It has the 3 unprocessed deltas in its input queue.
  5. ScopeProcessor updates its internal calculations with change1 and change2 input deltas.
  6. Another change newChange happens on the original model.
  7. ScopeProcessor updates its internal calculations with change3. It does not handle newChange.
  8. ScopeProcessor returns the scope-based validation to derivationBackend.
  9. ScopeProcessor handles newChange.
  10. At the same time as 4., ValidationProcessor uses bulk protocol to request the original model.
  11. Repository replies with the original model. This might happen before or after newChange (we don't really know).
  12. ValidationProcessor does its work and returns validation to derivationBackend.
  13. At the same time as 4., DomainValidator uses delta protocol. It has the 3 unprocessed deltas in its input queue.
  14. DomainValidator processes change1, does not need to do anything.
  15. DomainValidator processes change2 and updates the persisted derived model in the repository.
  16. Now newChange happens to DomainValidator.
  17. DomainValidator processes change3, does not need to do anything. It does not handle newChange.
  18. DomainValidator returns validation to derivationBackend.
  19. DomainValidator processes newChange, does not need to do anything.
  20. derivationBackend aggregates all validation results and returns them to the generator.

As we can see in step 11, we don't have a global consistency guarantee. But I think this would be very hard to achieve without a modelix-like backend. Of course if the repository has such a sophisticated storage, derivationBackend can ask all processors with the appropriate context, so all of them work on the same state of the original model.

Why a global state identifier would not help

  1. Assume at step 2, the global state is aa. The generator asks for completeness on that state.
  2. In step 6 and 16, the global state updates to bb via newChange. The generator does not care.
  3. In step 15, DomainValidator changes the repository to global state ab. The generator wants to include that.
  4. In step 20, derivationBackend needs to correlate the requested state aa and the current state bb.

This correlation can only make any sense if the global state MUST be strictly monotonic. Even then, what would it reply? If it is really smart and can figure out that ab is relevant to the generator, but I doubt that. More likely, it tells the generator "sorry, outdated, please refresh yourself and ask again". Then the generator has to poll until it gets a consistent state -- not very desirable.

enikao commented 4 months ago

The same scenario as sequence diagram.

Red activity lines for repository are updates. Yellow boxes are the incoming, unprocessed changes.

source

scenario