[evaluator] Refine CollectionValue API to support smart collections

eclipse-ocl-bot commented 1 week ago

| --- | --- | | Bugzilla Link | 509670 | | Status | NEW | | Importance | P3 normal | | Reported | Dec 23, 2016 04:34 EDT | | Modified | Jul 14, 2017 06:32 EDT | | Blocks | 509842, 509668 | | See also | 516652 | | Reporter | Ed Willink |

Description

The CollectionValue API is already fairly Iterable-like.

It should be relatively easy to define classes such as CollectionIntersectionValue that just drop in as replacements for e.g. CollectionValue.intersection() without reifying the result as a new CollectionValue.

Does the CollectionValue API need any change?

The CG API can probably be improved to invoke new CollectionIntersectionValue directly.

Are derived SetIntersectionValue classes needed?

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Dec 24, 2016 04:43

(In reply to Ed Willink from comment #0)

CollectionIntersectionValue

An awkward first example.

{OrderedSet/Set}->intersection(Bag/OrderedSet/Sequence/Set)

can be lazily computed using an Iterator. For incoming Collections the iteration can iterate over each Collection wrt the other to give lazy results. For incoming Iterables the output history may need to be maintained, particularly if the incoming Iterable is 'infinite'.

However anything involving a Bag result prohibits 'infinite' collections. Lazy/infinite and Bag are almost contradictions.

While support for infinite collections is not mandatory, it would be nice to support them where it does not degrade performance. e.g. It allows a slow computation to print out intermediate results using e.g. oclLog().

Iterator - supports lazy/infinite evaluation, cannot be re-used.

Iterable - supports infinite collections, can be re-used

Collection - prohibits infinite collections, can be re-used

Perhaps each Collection operation should be implemented using an Iterator else Iterable else Collection with standard adapters to promote an Iterator to an Iterable to a Collection where necessary.

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Dec 26, 2016 12:48

CollectionIntersectionValue pretty much requires an Iterable as output in order to support the already-in-intersection result test. However Unique.intersection(Unique) can use one of the inputs.

If an operation return can be Iterable or Iterator we need two different APIs. CollectionValue.asXXXValue should continue to return an Iterable. A new API asXXXIterator should provide the opportunity to return an XXXValue that offers Iterator behaviour if it is feasible, Iterable otherwise.

We need to prohibit two asXXXIterator calls to avoid duplicate computation. This should be straightforward; a genuine multiconsumer should use a Variable. An accidental multiconsumer through careless coding should be detected by JUnit assertions.

If an Iterator is invited to provide an Iterable do we maintain an Iterable within the Iterator, or wrap it? A wrapped Iterable owned by the Iterator probably gives the best of both worlds and a state value for use by assertions.

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Dec 28, 2016 04:33

(In reply to Ed Willink from comment #2)

A wrapped Iterable owned by the Iterator probably gives the best of both worlds and a state value for use by assertions.

The old design had a variety of eager derived CollectionValues requiring churning for every operation. If instead we have a single lightweight CollectionIterator, that may have its own lazy list-of-values when memory is needed, and (not or) a lazy 'set'-of values when uniqueness is needed, memory costs can be much lower and churning avoided.

Sequence - lazy list\ Set - lazy list-of-keys + lazy set\ OrderedSet - lazy list-of-keys + lazy set\ Bag - lazy list-of-keys + lazy map-to-count

The old SetValue derived interfaces are a bit embarrassing. They ensured that Java programmers were well-behaved. They are instanceof tested for e.g. equals*(). Now any OCLinJava Collection can do anything with an implicit asXXX conversion caused by the request for a lazy set. SetValue etc need to be deprecated. OCL semantics are enforced by the limited collection operation signatures in the library.

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Dec 31, 2016 07:01

(In reply to Ed Willink from comment #2)

We need to prohibit two asXXXIterator calls to avoid duplicate computation.

The code therefore has a canBeIterable test to signal whether multi-computation needs a shared multi-access LazyIterator.

Unfortunately toString() which can be used from the debugger performs an access, so any attempt to iterate on behalf of debug must be suppressed in favour of a indication.

hashCode() also performs an iteration; not surprising; how can you return a hashCode without computing the future?

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Jan 02, 2017 13:45

The code on ewillink/509670 shows considerable promise, but maintaining the API requires extra wrappers. Ideally all of SetValue etc would just be eliminated so that CollectionValue is fully polymorphic with iterator() always being a BaggableIterator.

Nearly all Collection operations are recoded as Iterators. A few such as reverse() still to do. Instrumenting shows that 80% of old SetValueImpl etc are replaced. Most of the residue are iteration accumulators.

An experimental conversion to a SelectIterator shows that changing the iteration execution focus to the result is effective. Needs API evolution of LibraryIteration.

The new code needs more thorough testing since the lazy functionality defers execution and so cascaded operations might interact.

There is also a nasty impact on the OCL debugger and toString() since lazy evaluation just shows <> as many pending collection values. Is an eager evaluation option neede?

Development suspended pending stronger motivation.

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Apr 23, 2017 13:26

(In reply to Ed Willink from comment #5)

The new code needs more thorough testing since the lazy functionality defers execution and so cascaded operations might interact.

After rebasing, still three standalone errors; easy fixed by forcing an invalid query to avoid total laziness.

Two CG errors. At least one due to iterable() after iterator(). The CG must analyze to detect dual use collections that require an early iterable() call. But is even that enough to make this fragile protocol robust?

eclipse-ocl-bot commented 1 week ago

By Ed Willink on May 15, 2017 09:53

CG analysis added and isNonInvalid supported for Operation/Iteration calls.

The missing iterable() hazard is fixed by requiring all lazy iterators to support a reIterator() that allows a second cached traversal to remedy the call deficiency.

All looked comparatively good to go in to M7 at the last minute, but once made a bit lazier suddenly test_parsingDocumentsExample failed. Oops. If a popEvaluationEnvironment occurs while a lazy evaluation is pending a Variable needed by a lazy VariableExp may get lost. Attempting to do lazy in the interpreter without supporting analyses is clearly not possible. ? Similar hazard with loop bodies and iterator variables. See Bug 516652.

Fudge: executor.evaluate is eager?

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Jul 06, 2017 04:36

(In reply to Ed Willink from comment #7)

See Bug 516652.

for further integration progress.

eclipse-ocl-bot commented 1 week ago

By Ed Willink on Jul 14, 2017 06:32

(In reply to Ed Willink from comment #5)

There is also a nasty impact on the OCL debugger and toString() since lazy evaluation just shows <> as many pending collection values. Is an eager evaluation option neede?

toString() can be mitigated by use of reIterator() to provide something that iterates to create the string without distorting the source.

eclipse-ocl / org.eclipse.ocl

[evaluator] Refine CollectionValue API to support smart collections #1785

Description