HewlettPackard / mds

Managed Data Structures
GNU General Public License v3.0
27 stars 5 forks source link

Make isolated blocks that return values easier to use #37

Open EvanKirshenbaum opened 7 years ago

EvanKirshenbaum commented 7 years ago

[imported from HPE issue 299]

[This may be out-of-date, but I'm going to import it anyway and look at it later.]

In the current Java API, the heavily-overloaded isolated() function can take several different forms of argument (Runnable, Supplier<R>, Consumer<T>, Function<T,R>, and probably a few others). In each, it essentially boils down to

This is fine when the function is a simple Runnable or Consumer, but it runs into problems when it's something that wants to return a value. Logically, the value should be the one was computed by the function at the time of successful publication, but

  1. Rerunning tasks cleans up any modifications to the managed space, so all of the side effects are correct, but it doesn't change the value returned by the function, and
  2. Even if we could figure out how to get the function to recompute its value (which we can't), we've already cached the old value before we try the publish.

This implies that it's not safe to use isolated() blocks that return values, and we should probably remove them. (Note that this only applies to isolated(), which tries to publish. It doesn't apply to inReadOnlySnapshot(), detached(), or inSnapshot().) Rather, what should happen is that the isolated() call should take a Runnable or Consumer, which should make modifications in such a way that something outside of the child context can compute the overall result. This could be done by calling methods on an argument passed in or by modifying objects bound to the closure. At the end, these external arguments would have consistent data, and the value could be computed.

Note that this implies that when a task is re-run, when it makes calls on these objects, the new calls need to replace the old ones. (This would be a good use of the accumulators and task-dependent data structures, but could apply to simpler data structures if the set of parameters is deterministic, so re-running will always replace values, never omit them.)

To simplify things, I propose adding

static <T>
T isolated(Runnable func, Supplier<? extends T> resFunc) {
  isolated(func);
  return resFunc.get();
}

static <T, R extends Supplier<? extends T>>
T isolated(Consumer<? super R> func, R resFunc) {
  return isolated( () -> func.accept(resFunc), resFunc );
}
EvanKirshenbaum commented 7 years ago

[imported comment]

I've added the following (along with isolated() static functions and forms that don't take PubOptions):

  default <X,R> R callIsolated(PubOption opts,
                               Supplier<? extends X> compute,
                               Function<? super X, ? extends R> get)
  {
    X data = callIsolated(opts, compute);
    return get.apply(data);
  }

  default <R> R callIsolated(PubOption opts, Runnable compute, Supplier<? extends R> get) {
    return callIsolated(opts,
                        ()->{
                          compute.run();
                          return Boolean.FALSE;
                        },
                        b->get.get());
  }

  default <X, R> R callIsolated(PubOption opts,
                                X data,
                                Consumer<? super X> compute,
                                Function<? super X, ? extends R> get)
  {
    return callIsolated(opts,
                        ()->{
                          compute.accept(data);
                          return data;
                        },
                        get::apply);
  }

The first form calls callIsolated() as before to get a data block that the getter can compute a value from. The assumption is that the computer will stash the value there, and if resolution happens, it will update the contents but not return a different value (this will have to be well documented).

The second form is used when there's no actual intermediate data. The computer sticks the value someplace that the getter knows about.

The third form passes in the location, which is seen and updated by the computer and read by the getter.

Of course, all of these are convenience functions. If the programmer knows to do this, they could simply have the isolated function return, say, a Holder<T> and use its value.