eclipse-ocl / org.eclipse.ocl

Eclipse Public License 2.0
0 stars 0 forks source link

[library] Add Map constructors #2010

Closed eclipse-ocl-bot closed 2 hours ago

eclipse-ocl-bot commented 2 hours ago

| --- | --- | | Bugzilla Link | 540353 | | Status | RESOLVED FIXED | | Importance | P3 normal | | Reported | Oct 22, 2018 01:39 EDT | | Modified | Nov 21, 2018 05:49 EDT | | See also | 520059, 540884, 443003 | | Reporter | Ed Willink |

Description

The Map type has useable operations, but no bulk construction helper.

Suggest at least something like

aCollection->collect(e | Tuple{key = keyFunction(e), value = valueFunction(e) })->asMap()

to convert the idiomatic Tuple{key,value) into a Map.

Clearer/more efficient.

aCollection->collectMap(e | keyFunction(e), valueFunction(e))

but requires a 'novel' double body.

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Oct 22, 2018 07:12

In Epsilon, there is mapBy; a degenerate of 'collectMap' for the very common case of map index to element.

aCollection->collectMap(e | keyFunction(e), e)

"mapBy" is an 'active' name suggesting some mapping is occurring, when actually it is a constructor/aggregator. OCL has chosen "collect" for the "map" case so perhaps "collectBy", since every source value is 'collected', just indexed usefully.

Bug 520059 references a groupBy example where in Standard OCL the Map return is synthesized as a Set of Tuple(key, elements). A necessary evil in Standard OCL but superseded by Map.

QVTo has xcollectselect(body, condition) - a double lambda. But the return is flattened.

The "Parallel execution of first-order operations." paper at the OCL 2018 workshop references an Epsilon "aggregate" operation that tales three lambdas, the third as a fall-back initializer - not clear why this is needed.

"collectBy" can do the hard work in linear time. If further processing of the values is required, it can be done separately at no obvious extra cost. Conversely using a lambda to compute the unchanged value is a minor (optimizeable) cost.

Collection(V)::collectBy(K)(i : T[?] | lambda : Lambda V() : K[?]) : Map(K, Collection(V))[1] -- nested return CollectionKind is same as source CollectionKind

-- quadratic 'declarative' definition churning Map values\ let keys : Set(K) = source->collect(v | lambda(v))->asSet()\ in keys->iterate(k : K; map : Map(K,V) = Map{} |\ map->including(key, source->select(v | lambda(v) = key)) )

-- ~linear 'imperative' definition churning Map and Collection values\ source->iterate(v : V; acc : Map(K,V) = Map{} |\ let k : K = lambda(v),\ oldValues = acc->get(k),\ newValues = if oldValues = null then Collection{v}\ else oldValues->including(v) endif\ in acc->including(k, newValues} )

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Oct 22, 2018 07:35

Bug 520059 has been marked as a duplicate of this bug.

eclipse-ocl-bot commented 2 hours ago

By Frederic Jouault on Oct 22, 2018 08:02

In our OCL 2018 paper on the Active Map, we show how a Multimap can be used to represent grouped values, with a groupedEntries() operation to retrieve the Set{Tuple{key, Collection{elements}}} if required.

This Multimap can be computed from a collection of values plus a lambda to compute the key associated to each value, similarly to how collectMap works in previous comments. However, a simple Map, would not correctly support multiple values having the same key for the purpose of computing groupBy (see Bug 520059).

Therefore, either we also need Multimap in OCL (of which Map can be seen as a specific case), or we would need a specific groupBy operation.

Moreover, other bulk Map construction helpers are useful in other contexts, such as building a M(ultim)ap from a set of (key, value) pairs.

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Oct 22, 2018 08:28

Multimap may be an appropriate implementation technology, but at the OCL-level surely a Multimap is just a Map with a Collection type as its value? (In OCL a Multimap is therefore a special case of Map in which the value is Collection-typed.)

I see no difference between a Multimap - multiple values per key\ and a Collection-typed Map value - a collection of values per key

eclipse-ocl-bot commented 2 hours ago

By Frederic Jouault on Oct 22, 2018 08:45

The notion of Multimap is not an implementation technology. A Multimap may be implemented as a Map of keys to collections of values, but that is not the only way it can be implemented.

Guava's Multimap description is an interesting read:\ https://github.com/google/guava/wiki/NewCollectionTypesExplained#multimap\ Notably: "Multimap Is Not A Map".

The problem with collectMap if it builds a Map, and not a Multimap, is what should happen to values that are associated to the same key?\ When building a Map, only one value per key should be kept, and this is actually what your (Ed's) solution to "select DISTINCT" relies on:\ https://www.eclipse.org/forums/index.php/t/1095734/\ whereas when grouping multiple values per key, one wants to keep all values associated to each key.

So, we could actually have both collectMap and collectMultimap.\ collectMultimap could arguably return a Map(key, Collection(values)), but that would be making explicit the implementation technology of the Multimap concept.

Of course, we may not want to add all this complexity to OCL right away. However, one could build a case for providing powerful abstractions. For one, it generally enables more optimizations than having users encode their meaning in lower-level abstractions (e.g., encoding Maps into Sets(Tuple(key,value))).

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Oct 22, 2018 13:06

Yes, a construction collision leads to a difference, but I see 'collect' as (ideally) lossless, so only the multimap solution is acceptable; collisions must aggregate not displace. If the user wants a lossy operation that should be coded explicitly.

elements->collectBy(name)->values()->collect(any(true))

[?? If we want to facilitate the lossy Map creation, perhaps

elements->collectOneBy(name)->values()

and it returns invalid for a collision.]

eclipse-ocl-bot commented 2 hours ago

By Sina Madani on Oct 22, 2018 17:26

collectMap is an interesting idea. You may also want to look at the various map functions of the java.util.stream.Collectors API: groupingBy, partitioningBy and toMap.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/stream/Collectors.html

Regarding the "aggregate" operation in Epsilon, the third lambda is there if the key does not exist, so that the aggregation variable can be used in some meaningful way - see

http://git.eclipse.org/c/epsilon/org.eclipse.epsilon.git/tree/plugins/org.eclipse.epsilon.eol.engine/src/org/eclipse/epsilon/eol/execute/operations/declarative/AggregateOperation.java

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Nov 07, 2018 12:48

(In reply to Ed Willink from comment #6)

Yes, a construction collision leads to a difference

This has got confused.

sources->collectBy(k | v(k)) -- => Map k -> v(k)

incurs no risk of collision since there is a separate map-value per key and since the sources are treated as a Set to avoid gratuitous recomputation of v(k).

It is the inverse map that may lead to collisions:

sources->inverseCollectBy(k | v(k)) -- => Map v(k) -> k\ \

Kevin Lano, private communication, suggests a Map::inverse() operation which could just be a shortform for

sources->inverseCollectBy(k | at(k))

If a Map::inverse() is provided then 'inverseCollectBy' seems like a better spelling than 'groupBy'.

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Nov 07, 2018 12:57

(In reply to Ed Willink from comment #8)\

sources->collectBy(k | v(k)) -- => Map k -> v(k)\

incurs no risk of collision

except that if collectBy, like collect flattens, the flattened elements may collide. If collectBy is like collectNested, there is no problem.


In practice the difference between collect and collectNested seems to just be that when collect and collectNested differ, use of collect is a bug awaiting a change to collectNested. The no-nested collections design goal was naive in the extreme.

If collectBy emulates collect, we perpetuate the bad design and require the bug fix.

If collectBy emulates collectNested, the naming is a bit inconsistent.


If collectNestedBy emulates collectNested, no problem.

If collectBy emulates collect, and creates a MultiMap to avoid collisions, the 'unexpected' nested collection type may alert users to the flattening hazard.

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Nov 20, 2018 08:23

(In reply to Ed Willink from comment #9)

If collectNestedBy emulates collectNested, no problem.

This is a needlessly heavyweight solutuion to a non-problem.

collectBy produces a value per key so there is no rationale for flattening the value. It is only inverse-collect that can collide. We can decide on flattening when/if it is implemented.


Realistically is is the the flattening of collect that is an unfortunate leftover of the original no nested collections OCL/QVT. In practice there is usually (? 95%) no difference between collect/collectNested, so it might be possible to evolve collect to be non flattening. No need to add collectFlattened since colecct(...)->flatten() is adequate.

It would be good to instrument to see how often collectNested and collect differ and how often this difference is atically determinable from type declarations.

eclipse-ocl-bot commented 2 hours ago

By Ed Willink on Nov 21, 2018 05:49

collectBy for collections and maps pushed to master for M3.