GlenKPeterson / Paguro

Generic, Null-safe, Immutable Collections and Functional Transformations for the JVM
Other
312 stars 24 forks source link

if map contains key then error else add key/value to map #15

Closed cal101 closed 7 years ago

cal101 commented 7 years ago

Moin,

while implementing a primary index with ImMap i fell fack into mutable habits:

if(contains) then error else add

in my case the bias is on keys being unique and so i can just add the new key and see if the map instance changes.

That assumes that the following holds true:

map.assoc(existingKey, v) == map

If that holds true i wonder if assocIfAbsent is useful to add. May take a value or a supplier.

Opinions?

Cal

GlenKPeterson commented 7 years ago

One thought is that entry() returns a Paguro Option which supports pattern matching (not referring to regular expressions), so you can:

map = map.entry(someKey)
         .patMat( kvPair -> map,
                  () -> map.assoc(someKey, v) )

You are using == (I assume in Java, because it means something different in Kotlin) which is referential "equality" with Objects (and value equality with primitives). You're really asking, "Do these two objects share the same address in memory?" as a test for whether they are equal or not. This is the default implementation of Object.equals(). Actually, assoc() uses this same definition of equality: https://github.com/GlenKPeterson/Paguro/blob/master/src/main/java/org/organicdesign/fp/collections/PersistentHashMap.java#L261

This will work for Strings that are compiled into the program (because the compiler de-duplicates Strings), enums, and primitives (which are actually compared by value because they have no address). But you can run into problems with any other kind of object. For instance, new Integer(543) == new Integer(543) is false (they are at different locations in memory) even though new Integer(543).equals(new Integer(543)) is true.

Rich Hickey used referential equality in PersistentHashMap as a speed and memory optimization. It works for that, but it's not a substitute for a true equality test based on the important values in an object.

I'm guessing a bit here. If I'm not answering your question, you might try writing a function (even a poor one) that does what you want. Submit that in your response. Just so that I can see the context of what you're doing. I don't mean to extend PersistentHashMap, just write a function that takes a map and returns either a Map (if you want to ignore duplicates) or an Or<Map,Exception> (if you want to do functional error handling).

cal101 commented 7 years ago

I think something like

Map<K,V> put(K key, U value, BiFunction<? super V,? super U,? extends V> merge)

as defined in javaslang Map interface

http://www.javadoc.io/doc/io.javaslang/javaslang/2.1.0-alpha

or

public TreeMap<K,V> update(K k, F<V,V> f, V v)

From

http://www.functionaljava.org/javadoc/4.7/functionaljava/index.html

would allow what I meant.

GlenKPeterson commented 7 years ago

Thank you for taking the time to explain this. I'm including direct links to your specific examples just so I can reference them very quickly in the future.

JavaSlang: http://static.javadoc.io/io.javaslang/javaslang/2.1.0-alpha/javaslang/collection/Map.html#put-K-U-java.util.function.BiFunction-

FunctionalJava: http://www.functionaljava.org/javadoc/4.7/functionaljava/fj/data/TreeMap.html#update-K-fj.F-V-

Let me see if I understand this correctly. Both methods return a new Map. If the old map contains the key, a function is executed and the result stored as the new value in the map (the result could be the old value unchanged). This function takes the old value associated with the key (and maybe the new value as well). Is that correct?

How are you using this feature? You mentioned assocIfAbsent. If I were to implement that in JavaSlang, it would look like this:

map = map.put(key, newVal, (oldVal, newVal) -> oldVal);

In FunctionalJava:

map = map.update(key, newVal, (oldVal) -> oldVal);

In Paguro:

// get a Paguro Option of the k/v pair associated with this key.
map = map.entry(key)
         // if isSome() return the map unchanged.
         .patMat( oldKV -> map,
                  // if isNone(), assoc the new value.
                  () -> map.assoc(key, newVal) )

ML, Scala, and Haskell allow matching based on types. It's a little bit like destructuring in a way. Paguro's Option is intended to be similar to that. OneOf2 works the same way and it will probably be joined by OneOf3, OneOf4, ... in the future. Those might be renamed Union2, Union3, etc. because they can function like Union types for Java. But I digress...

Your other example was if(contains) then error else add. Maybe that's what had me thinking about Option. If by error you mean Exception:

JavaSlang:

map = map.put(key, newVal, (oldVal, newVal) -> throw new RuntimeException("contains"));

FunctionalJava:

map = map.update(key, newVal, (oldVal) -> throw new RuntimeException("contains"));

Paguro:

map = map.entry(key)
         .patMat( oldKV -> throw new RuntimeException("contains"),
                  () -> map.assoc(key, newVal) )

So, FunctionalJava is the briefest and Paguro is the most verbose for this use case. I haven't personally seen a lot of use for assocIfAbsent(key, value). I usually am fine with letting the map take the last value that was put in it. So it doesn't seem a strong candidate for a convenience method. But perhaps the way you use maps will convince me otherwise?

Hmm... You know Paguro's ImList has a reverse()? If you reverse your input, then let the map take the last value, the result is the same as what I think you're asking for.

ImList<User> users = ...; // however you get this data

ImMap<Long,User> usersById =
        users.reverse()
             .toImMap(u -> tup(u.id, u));

By reversing the input, the earliest values effectively overwrite the later ones.

I still don't feel like I completely understand your use model. I hope this is helpful.

cal101 commented 7 years ago

I am using it like a database primary index. First key wins, exception on duplicates. Your RuntimeException example fits it. Without such a method you typically have to do 2 tree search walks:

if (!map.containsKey(key)) // first walk map = map.assoc(key, value) // second walk with modification

For HashMap that was fixed with putIfAbsent.

The difference between your patMap example is the extra option object and the for me strange means of using something that looks like an option return value to add to the map i got it from. The paguro map misses something like putIfAbsent if you call it patMap or whatever. Patmap as a new method on the map is even better than the other examples because the computation of the value can be lazy.

Result: I want ImMap.patMap ;-)

GlenKPeterson commented 7 years ago

Paguro generally accommodates what you are asking for with patMat ("pattern matching"). I understand that you don't like creating a tuple, or the syntax of accessing the map inside the match. You are asking for a convenience method for your specific use case which I think is somewhat rare. Unless/until more people ask for it, I can't see putting it ahead of the other development priorities listed here: https://github.com/GlenKPeterson/Paguro#future-development-priorities-as-of-2016-11-13

So, "maybe someday" is the current resolution of your issue. I'm just going to rename this issue and leave it open for a while to see if more people vote for it.

GlenKPeterson commented 7 years ago

I think I have new insight into what you might be doing. You must have some source of data to add to this map. Let's assume it's from a java.util.ArrayList, called someList. If you didn't care about duplicates, transforming to a map might look like this:

ImMap<String,Integer> foo(ArrayList<String> someList,
                          Fn1<String,Integer> f) {

    return xform(someList).toImMap(x -> tup(x, f.apply(x)));
}

Maybe you would add .filter(), .map(), etc. before .toImMap(), but I'm leaving that out of this example for simplicity. The problem for you is it doesn't stop when a duplicate key is encountered. You can do that with an exception and a fold today (sorry I didn't think of this before):

xform(someList)
        .fold(map(),
              (accum, s) -> accum.entry(s)
                                 .match(exists -> {throw new Exception("Duplicate");},
                                        () -> accum.assoc(s, f.apply(s))));

map() creates the new empty immutable map to be our accumulator. fold then loops, passing the map and the next item from the source data to the reducer function. In the reducer, it checks if the map already contains that key and if so, throws an exception. If not, it returns the accumulated map with the new key added. When all items are processed, fold yields the accumulated map.

Yay! But many functional programmers don't like exceptions. Especially to cause loop termination.

If you want to return an Or<GOOD,BAD> instead of throwing an exception, there is another fold method that terminates early, but doesn't have quite the signature that you need. If you like this, I could probably change the signature of that method in Paguro 3.0 which will have some other small breaking changes. Here's the code that you'd write. Notice the outer function now returns an Or instead of an ImMap directly:

Or<ImMap<String,Integer>,String> baz(ArrayList<String> someList,
                                     Fn1<String,Integer> f) {
    return xform(someList)
            .foldUntil(map(),
                       (accum, s) -> accum.containsKey(s) ? null
                                                          : "Duplicate key",
                       (accum, s) -> accum.assoc(s, f.apply(s)));
}

Ahh... purely functional error handling.

The first (accum, s) -> function is a terminate-early function that actually gets run for each item in the source data immediately before the reducer function. The second (accum, s) -> function is the standard reducer function. As long as the terminate-early function returns null, the loop continues and will eventually yield an Or.Good of the appropriate type. If the terminate-early function ever returns a non-null, the loop stops and the fold immediately returns an Or.Bad consisting of whatever the terminate-early function returned. Here's the signature for the proposed fold-with-terminate-early function:

<G,B> Or<G,B> foldUntil(G ident,
                        Fn2<? super G,? super T,B> terminator,
                        Fn2<? super G,? super T,G> reducer);

The G means Good, B means Bad, T is the type of the input items from the collection you're processing. This will work and be fast if it solves your issue.

If you actually had to throw the exception later for some reason, you could:

ImMap<String,Integer> doTheBaz(ArrayList<String> someList) throws Exception {
    return baz(someList, String::length)
            .match(good -> good,
                   bad -> {throw new Exception(bad);});
}

I need to think about all the implications, but I like it and already coded it into Paguro 3.0. I think it will stay there, so please let me know if it's not right.

Questions

Q: Why are you returning null? I thought you didn't like null? A: Wrapping everything in Or.good() and unwrapping it on every iteration is slow and uses memory. The early-terminating fold-left is used internally in some performance critical places. It also complicates the method signature, the implementation, and usage by end users. Finally, null is only really evil when it's returned instead of an empty collection.

Q: What else is in 3.0? When will it be out? A: An RRB Tree. Everything deprecated in 2.x will be removed to keep jar size small. Possibly a new layer of interfaces. All Function1, Function2, etc. interfaces are renamed to Fn1, Fn2, etc. foldLeft() renamed to fold(). It will be released when it's ready, but a release candidate might be available by end of June?

Thanks for your interest. I hope you can continue to enjoy using Paguro!

GlenKPeterson commented 7 years ago

I think you have given up, which is OK. But for the record, this is now a part of Paguro 3.0 here: https://github.com/GlenKPeterson/Paguro/blob/2016-05-22_RRB-Tree/src/main/java/org/organicdesign/fp/xform/Transformable.java#L106 I'll close this ticket when that's released. If you need it sooner, I can back-port it into 2.x.

cal101 commented 7 years ago

No need to backport. Has nothing to do with my requirement or question as far as I see. No need to discuss that any further.

GlenKPeterson commented 7 years ago

I fixed your issue as I understood it, but maybe not as you understood it. Not sure whether that means this should be closed as "fixed" or "wont fix" but it sounds like time to close it either way. I feel you have helped the development of Paguro and I thank you!