Derived Values - Githubissues

My initial cut of the problem definition is as follows:

within a re-frame app ...
... for a certain period of time (perhaps while a certain panel is showing) ...
... there's a need to compute a materialised view of data within app-db ...
... which will be used both within views (provided by subscriptions) ...
... and within event handlers.
The computation involved may be a CPU expensive, so we want to cache it. Ie. we don't want to have it calculated via subscriptions and then ALSO calculated within event handlers. We want it only calculated once, when required and as inputs to the calculation change, and then available to both contexts.
Note: we also don't want the materialised value computed outside of the "time" when it is needed. For example, the calculation might only be needed when a single panel is showing to the user.

Currently, subscriptions provide a way to create "derived values" that compute only when needed. So they tend to be the tool we reach for. But is there a better solution??

I wouldn't constrain the time, i.e. I think that the 2nd, the 3rd, and the last points should not be there. I also wouldn't constrain the place, i.e. the 4th point should not be there.

Calling reg-sub creates a graph. It's there regardless of whether those subs are actually used or not. In some situations, event handlers might need values from the nodes of such a graph. In other words, subscriptions have the capacity to create values and event handlers should be able to utilize that capacity. That's the bare minimum description of the problem, IMO.

I don't think there's any extra mechanism that's needed - I think re-frame-utils solves the problem almost perfectly. There are, however, some improvements that I can see:

An ability to mark any subscription as a persistent one, so it doesn't get removed from the cache, neither by re-frame-utils nor by re-frame itself
A warning when someone calls subscribe or derefs a sub outside of the reactive context, unless such a sub is marked as "event handler-friendly" (the warning should still be there for regular JS events/callbacks)

On a related note, recently you mentioned that you were looking at rule engines. I'm still not entirely sold on the idea of them being useful for re-frame (albeit, I haven't spent that much time thinking about it), but this is a place where a rule engine would fulfill the need automatically, just by the virtue of always having all values (inputs, intermediate values, outputs) stored within one place. That is, if I understand the concept well enough.

@p-himik

I wouldn't constrain the time

I believe the requirement is often time-constrained (but certainly not always). So, I believe that's an essential part of the problem statement because it will make some solutions viable and others not so.

Calling reg-sub creates a graph

I know you know this, but for clarity, I'll point out that this statement isn't true. reg-sub defines the template for how to create nodes in the Signal Graph. But what drives node existence is subscriptions. And what drives subscription existence is View existence. The Signal Graph feeds the data-hungry requirements of the current set of View functions (which contain subscriptions). Sometimes you need certain nodes, and sometimes you don't.

So this comes back to time again. Or, better, "state". Sometimes the application is in a state which requires these materialised views, and sometimes it isn't. At the moment, the state is implicit. The existence of certain Views (or not) is what drives subscriptions which, in turn, drives nodes in the Signal graph, which does the calculation of "derived data".

I know you know all this. But I'm trying to be crystal clear. A good problem definition is the best gift you can give yourself.

So my summary: applications have different states. In different states they need to calculate certain "derived values" and those values might need to be delivered to views or might be used within event handlers. Subscriptions are certainly a very nice way of handling this because they come into existence implicity when "the state" (of the UI) asks for them, and then disappear when no longer being used (by views).

app-db --> views (state of the UI) --> subscriptions (state of Signal Graph) --> computation (derived state) --> views and event handlers

We're talking about different kinds of graphs. I was thinking specifically about what you call the template, and you're describing the concrete values graph. To the former, the views don't matter at all. Same with events - they don't care about views, they just care about cofx. It doesn't matter if a particular view is mounted and is the one that requested value X if some completely unrelated event that has nothing to do with that view or even the whole UI really needs that value X.

To me, it's easy to imagine a situation where a re-frame app is run without any views at all. Only events that use that subscriptions template to create some concrete values and reuse them when possible, thus creating a cached value graph.

applications have different states. In different states they need to calculate certain "derived values" and those values might need to be delivered to views or might be used within event handlers

I might be reading it wrong but these sentences feel like there's a direction of intent, so to speak. The state changes (i.e. app-db), and then stuff happens because of that change, including some event handlers potentially asking some subs for their values. That, however, is not always true. In fact, quite often, in my own experience. As a simple example, consider a button press that dispatches a particular event (just in case - that button has nothing to do with the state, it's just a static button with the right :on-change value. It might be substituted with a WebSocket message or a key press or any other "active" thing in the vast web API world). That particular event might not need the :db cofx and might not produce the :db fx, but it might need some subs - only because the specification of how the relevant value can be computed is already there, in the template, almost ready to be used.

Here’s my attempt at what the needs are. Got a bit lenghty…

First of all I feel like I understand the differentiation between template vs graph. The former is a blueprint representing how it will be connected and calculated if ever needed and the latter represents what is actually in use with caching and recalculations on change. I also see that this is “simple” for UI registered subscriptions as they explicitly send creation and teardown signals. For event subs the template part would work exactly the same, and while dispatch of an event could itself be seen as a creation signal, there is no explicit teardown (unless you do it right away like re-frame-utils). This last part makes it tricky, more inefficient and/or potentially more involved for the end user if you want to provide more control over this behavior (one-off vs TTL vs permanent).

From a usability point-of-view I would say the ideal solution to me is something like this: 1) There is no requirement that a given subscription is used in UI for it to be available to an event. 2) Usage is as close as possible to reg-sub. a) It supports :<-[:sugar] syntax just like reg-sub b) And dynamic subscriptions 3) If there is a need for explicit TTL timers or similar for efficiency, their expiry will lead to performance penalty, not failure. 4) No default warning about performance penalty, as I understand might be the case with re-frame-utils’s inject. Having to constantly add meta data to avoid it seems like needless clutter (note that I havent’t tried it).

The “extreme” for 1) is e.g. a js/setInterval dispatching an event that depends on a subscription chain that has nothing in comon with any UI subscription needs.

I think I could personally live with quite a few limitations, as long as what you run into is performance penalties and not malfunction.

One usage scenario I have: User performs a search returning a dataset of 10k entities. The ultimate goal is to maybe send a subset of this back to the server. Attributes for each entity sent back is way lower than what is needed by UI, but might contain computed data not required by UI. User interactions might cause recompute of the presented subset many times. Only if the user is happy with the presented subset, and triggers an event, is the event specific subscription calculated, but the user might also decide not to send anything at all back to server. In that case the event subscription is never calculated.

The above is a description of sibling subscriptions where both the UI and event sub have common ancestors in the graph. Likely at least one level 3 subscription in common, but after that the needs deviate. Subscription used in event is as described above while the UI might need eg. translation of key names to something acceptable for interop with d3js (e.g. no namespaced keys and underscore instead of dashes in names).

An attempt at describing the above sibling approach in re-frame like psudo code:

reg-sub   ;; a level 2 sub
:entities/raw
<(get db :entities/raw)>

reg-sub  ;; Might run many times due to user interaction changing config params
:entities/filtered-and-augmented
:<-[:entities/raw]
:<-[:config/params]
<potentially CPU intensive processing>

reg-sub
:entities/ui-prepped
:<-[:entities/filtered-and-augmented]
<adapt data to UI format>

reg-sub  ;; Could be skipped if you instead do this in the event handler below, but it might be nice to separate them.
:entities/backend-prepped
:<-[:entities/filtered-and-augmented]
<adapt to backend format>

reg-event-fx
:send/to-server
:<-[:entiites/backend-prepped]
<eg send data with :http-xhrio>

As mentioned on slack, another usecase/benefit I see is to hide implementation details by storing data in db under a auto namespaced key (e.g: ::raw-entities), and provide a subscription where the rest of the code can get hold of this data (<sub [:entities/raw]). Features wise, I don't see that that usecase has any more requirements than already mentioned. The key is no dependency on UI usage.

As I don’t know the re-frame codebase I should not venture into implementation ideas, but for what it’s worth: In a scenario where the UI and event subs share a common ancestor in the graph (I believe this might often be the case), would it be possible to keep the event subscription branch alive as long as the ancestor is needed by UI? The event sub branch would not participate in “ref count” for the ancestor, but be a passive participant relying on UI for signalling cleanup/teardown. Should it happen that the UI unmounts too early, then you pay that penalty for doing a one-off event sub calculation afterwards. I'm not sure this helps anything, it was just somehting I though of for help with signaling teardown. So, not dependant on UI usage, but an efficiency mechanism to piggyback on it. Con: Unless there is a good way to avoid it, it might mean that the event branch is re-computed more than needed.

I hope it helps. If not, feel free to direct input needs in the right direction.

Is there any progress?

The more our re-frame application grows the more is the need to reuse subscriptions inside the events. We have a lot of derived values and duplicating subscriptions logic to be able to use it inside events becomes harder and harder.

Even the solution that does not cache the calculation but runs it every time would be a huge step forward for us because at the moment we do those calculations anyway or have to pass a lot of data through the UI even if we don't need it in the UI itself. Honestly, most of the team members prefer the latter approach because they don't have to refactor all the subscriptions involved. E.g.

(reg-sub
  ::data-to-perform-event
 (fn[]))

(reg-event-fx
  ::some-event
  (fn [world [_ db-data-for-event param1 param2]]))

(defn my-view []
  (let [data-for-event @(subscribe [::data-to-perform-event])]
    [:div {:on-click #(dispatch [::some-event data-for-event param1 param2])}]))

I think most of this could be resolved, if when discarding a reaction after it's removed from the subscription cache it'd instead be moved to a small secondary LRU cache. That way the most commonly used event-only subscriptions would still stay cached.

Here's my take on things, and a prototype: https://github.com/day8/re-frame/pull/790/commits/7055a38c4fe39109e05909b7f09a381313a59247

I think I know why this issue is unsolved for 8 years. It's a general FRP problem with no general solution (AFAIK).

Caching means trading state for computation. Concretely, ram for flops. But, you can't cache forever. State is finite. A cached val needs a lifecycle. We must create and destroy cached vals. But, when your framework is stateless, there's no obvious lifecycle a cached val should follow.

We call a function unreasonable when it has different effects at different times & places. Re-frame's subscription has a shadow-API. It's unreasonable. Inside a reactive context, it ref-counts. Outside, it caches indelibly. We describe this vaguely, calling it a "mistake" or a "potential memleak". Only the super-nerds really understand what we mean. Shoutout.

#754 doesn't help. It just changes the shadow-API. re-frame-utils.cofx/inject names its caching strategy, but this name is too generic. It also doesn't support one of re-frame's key features.

There are more solutions out there. They're all bound to be incomplete. We don't agree on the right caching behavior because objectively there is none. Even if we somehow did ref-counting everywhere, that isn't ideal for every use-case.

What if you dispose a sub, just to bring it back 1ms later? Ref-counting won't help.

What if your sub has a big memory footprint, but you only need it once? An LRU cache won't help.

So, there's no single way a sub should work. In other words, subscriptions are polymorphic. Clojure is great at polymorphism, and re-frame is a model case. To decomplect subs, all we need is a registry and a dispatch.

Clojure has its cake and eats it. That's because it lives its opinion as vigorously as it defends the alternatives. We say "eek!" to a non-reactive sub, but we need not project this apprehension onto the user. Instead of bluntly opinionating re-frame, let's decompose our opinions into a namespace:

::raw Don't cache.
::forever Don't clear.
::reactive Dispose on unmount. Warn outside a reactive context. Status quo.
::safe If reactive, dispose on unmount, else don't cache. Our new default? See #754.
::timeout Dispose after some time.
::backoff Dispose after a time period which prolongs with repeated access.
::lru Cache a finite set of recent values.
::async Wait for a channel before disposing.

We don't avoid complex behavior, but at least now we've named it. This makes re-frame more practical, especially for power users. This makes re-frame more articulate, especially for new users. We educate the user on what to use when and why. We provide a default that's safe & easy to explain. We stop saying "caching makes re-frame performant!" We start saying "re-frame supports an open set of performance strategies!" Crucially, this means we provide a clean API for the user to define their own lifecycle:

strategy Names the lifecycle used by a given query.
method The registered implementation for a strategy.
query-id First in a vector, or the val for the strategy in a map.
handle Call the handler for a query.
cache Supercedes query->reaction with strategy->query->reaction.
cache! Given a strategy and a query, allocate the handler.
clear! Given a strategy and a query, free the handler.
reg-sub-method Implement the lifecycle. Support either vector or map queries with 2 arities.

We use metadata to colocate lifecycle and query (best-effort). We support map queries, where lifecycle is a key (true colocation). Signal and computation fns are unchanged, except the first item in query-v may be a map query. Putting the query in the query isn't the simplest, but it makes the system very backwards-compatible. I agree that destructuring the query-id is rarely necessary. The new query-id destructures it just as well. Queries are concise because lifecycle is a key, not a val. Each lifecycle strategy has its own cache. This may waste some space, but it makes the system very reasonable & future-proof.

Thanks for coming to my ted talk. Please let me know if I've missed anything big.

I made a second prototype: https://github.com/day8/re-frame/pull/790/commits/1238515213a8399f7d47f90ea5bdc1eeb8a72e1b

I think breaking the API at a different point makes it simpler overall.

Query-maps are simpler. No more looking up the "first" registered key to find lifecycle and id. Instead, a query-map has ::rf/q and (optional) ::rf/lifecycle keys.

A query-vector is used as-is. No more putting a query-map inside the query-vector. Instead, there are two different ways to register a subscription:

reg :sub accepts a query-map. If you sub to a query-vector, it's converted to a query-map.
reg :legacy-sub is the same as the original reg-sub. If you sub to a query-map, it's converted to a query-vector. Use ::rf/query-v to pass positional args in the query-vector.

Sub handlers always know the lifecycle of the query. No more exceptions.

Registering a lifecycle is simpler. No more arities. Now it can handle map or vector queries naively.

My polymorphic subscription prototype is still in alpha, but we've been working the new flow feature as well. This provides another way to define a derived value explicitly, in a way that's not coupled to a reagent lifecycle. I'm not sure flows can replace subscriptions completely, though. So we're still figuring out if we should add this new feature to subscriptions. https://day8.github.io/re-frame/Flows/

I've updated a few articles to explain more clearly the problem with subscriptions, and why they're a leaky abstraction of dataflow programming.

day8 / re-frame

Derived Values #680