Yomguithereal / baobab

JavaScript & TypeScript persistent and optionally immutable data tree with cursors.
MIT License
3.15k stars 115 forks source link

Facets are actually implementation leak #240

Closed ivan-kleshnin closed 9 years ago

ivan-kleshnin commented 9 years ago

I think about this thing... Facets described as "views over data". But the same thing may be said about cursors. They are also "views over data". The difference is that one data is "static" and other is "dynamic". But this means nothing. If we have c = f(b) rule we never conclude that c has different nature than b. Derived and initial data are expressed in the same syntax and are equal for the consumers. The whole Math and Computer Science are based on that.

Unfortunately, this is not the case with facets. Client code must be aware of this artifical separation:

let foo = state.facets.foo;
vs
let foo = state.select("foo").get()

or

@branch({
  cursors: ...
  vs 
  facets: ...
})

This seems wrong to me. I shouldn't be concerned about such private details of the data in the client code. Is it "static" or "dynamic"? I don't care. I shouldn't ask. But now the client code and implicit rules about our data are coupled and we can't simply switch between a <- b and b <- a causalities. This means implementation leak.

Unless I miss something, I propose to think about merging cursor and facet concepts into one more powerful abstraction (keeping cursor name). So cursors may be expressed in terms of cursors and static data then.

But... saying that... I'm afraid that we actually reinvent the wheel here. This issues @christianalfoni raised: https://github.com/Yomguithereal/baobab-react/issues/44 https://github.com/Yomguithereal/baobab/issues/180 push me even more to the thought that Baobab will benefit being built over smarter abstraction(s). Event emitters are too primitive. We want to control initial states, we want to have movable parts having single app state at the same time. We want more and more complex primitives to express relations between data in facets like filters of all kinds...

It sounds like... RxJS Observables could handle this better. Or CSP channels.

There are attempts to bind React and Rx... with more or less luck. https://github.com/fdecampredon/rx-flux https://github.com/r3dm/thundercats

Noone of them takes the concept of single app state, they are basically follow the Flux path having distinct Stores. But in everything else... we are moving to the same direction. Any state (including Baobab trees, of course) can be expressed in terms of temporal reduce function named scan. The difference between it and familiar reduce we used to is that this scan broadcasts every new state to the observers, not just returns one final data (because most of data sources never finish). I wonder if it's possible to just drop all that event emitter poor machinery and rebuilt everything on something more powerful and more suitable to our big big big list of requirements. Sounds scary, I know.

I'd like to think I overcomplicate things and there is a well-defined outline of what Baobab should and shouldn't do. Somewhere. But I'm afraid I'm not.

:camel:

Yomguithereal commented 9 years ago

baobab@dev now implements computed data within the tree #278.

Yomguithereal commented 9 years ago

I guess we can close this since facets are no more an implementation leak :).

oresmus commented 9 years ago

I am coming late to this discussion, since I am just learning about these libraries (and about web programming and js in general), but OTOH I have thought about related issues for some time, know something about FRP, etc.

I just want to describe a use case to support the view that there should be no enforced distinction (even by strict naming convention) between computed and "directly stored" state in your interfaces for accessing state.

use case: data with more than one possible representation

Consider an interface which wants to reveal data in either of two representations. E.g. temperature in degrees F or C, or a 3d model in either VRML or as a .obj file (or whatever).

We would like two accessor methods, one for each representation that is desired, so the functions can have a typed return value (whether formally or just in their documentation).

In a typical implementation, the backing store (I mean the state tree, like Baobab, plus whatever layers around it help with loading or computing data) will keep the data in one format, and compute the other one when requested, perhaps caching it or not.

But a change in implementation of the store might change the decision of which format is stored vs computed; or this might differ for different individual objects, depending on their source; it might change over the lifetime of a single object; it might even keep both forms for speed. But the code that does the accesses should not have to know about any of this. It should be able to use the same interface whether the data it requests will turn out to be directly stored or computed.

but don't we need to know whether it's computed in practice?

Now, indeed there are use cases where it feels like you need to know the difference. But if you examine them more closely, what you really need is something else, which can be expressed by either a fancier data type for the return value, or a fancier interface permitting a succession of values, or both. (And Baobab already gives you the "succession of values" by default, so all we need to add here is the metainfo about each value.)

For example, suppose some data is computed and this computation might take a long time, or fail, or be non-deterministic, or depend on the client platform. Then at the very least you need to account for the final result not being available right away, and ideally you might want to get more info than just the value, like a loading flag, an error message, a series of partial results (so the user can see that part of the data that's already loaded), warnings that it's non-deterministic, etc. Then your UI has the option of displaying something that depends on that meta-info about the ordinary value, like a "loading ..." message.

What you need then is for the return type from the access function to have that extra info, as well as the ordinary value. (Or alternatively, one accessor for the ordinary value and one for just the associated metainfo.)

If you want the usage to be simpler, you can always convert that into a simpler type with some standard wrapper, which might return a promise, throw an error if the final value is not yet ready, return the best current approximation to the ordinary value, or whatever.

You might think that you only need those things for computed data, not for stored data. But suppose your application starts storing too much data to fit on the client, and you want to revise it to load some of it lazily from the server? Then many of those same things might happen, and you need to account for them. Effectively you are revising the implementation of stored data and replacing it with computed data (thinking of "loading from server" as a form of computing).

So indeed you might need to start using a different interface then (and you might want a naming convention to correspond to which interface you're using), but it's not because of whether the data is computed or stored, but because of complexities in the nature of the data you actually want to access and display. And though this is correlated in practice with whether the data is computed or stored, it's not the same thing -- some computation is trivial enough to ignore, and some stored data can be so slow to access that it might as well be computed.

(To keep things non-confusing, you probably do want some kind of naming convention about the interface, e.g. data_name returns the ordinary value vs $data_name returns all the fancy metainfo too. But the distinction indicated by the name is about the interface, not the store implementation. By convention you might provide both methods for everything. Of course you'd want a non-boilerplaty way of implementing that.)

what about raising an error if you try to set computed data?

There is also the issue of computed data being an error to "set", but if you're using this one-way-flow pattern, then even your ordinary stored data should be an error to set from its access interface. So this is not a real difference either.

(And to continue the examples above, you could provide methods elsewhere capable of setting the data; you could even have a set method for each format, if necessary converting the provided value to the format it wanted to store, or changing its mind at that time about which format it did want to store for that object.)

Yomguithereal commented 9 years ago

Hello @oresmus. This is very interesting. Thanks. I feel that some points you raise here are at the center of Cerebral's philosophy (@christianalfoni). I am very happy to see you vouch for the homogeneity of access interface for stored/computed data because I was currently doubting some things about this.

Concerning the final part, this is what I try to achieve with the current v2 implementation and does make sense to me also.

Yomguithereal commented 9 years ago

I will take some time to ponder your text some more and will be back with more feedback if you want to develop on this discussion.

AutoSponge commented 9 years ago

I'm using cursors and facets interchangeably in my views (where I don't need to set). This is all I needed to do: const data = (e && e.data) ? e.data.data : e.target.get(); If you want to unify the interface, just start with that but IMO it's trivial to wrap it.

Having built an entire app with Baobab and something other than React, I can say I prefer having the distinction between facets and cursors. I know cursors are inherently "unsafe" because they can be a source of mutation. My views only declare cursors for data they can update (from user intents). Otherwise, it's common to see something like this at the top of a view: const {user$, entry$, page$} = appModel.facets;

Lastly, as I mentioned before, eventemitters are the backbone of streams and therefore can become streams if the implementor wishes. There's no reason to force streams/observables. Just create a wrapping lib that delivers streams or promises or whatever.

Yomguithereal commented 9 years ago

Is that to say you prefer keep the facets outside the state altogether then?

AutoSponge commented 9 years ago

@Yomguithereal, I think facets are a convenience. They wrap, sometimes naively, 1 or more cursors and emit updates of their own. It's very helpful and a pleasure to code around, but I could just as easily write it myself with possibly better results.

For instance, I have a very complicated facet in my application that gets updates from 7 other facets. When the app bootstraps, much of the data is not present and I have to short-circuit the calculations for performance. There's nothing in Baobab to say that a dependent cursor or facet is required for calculation.

This also causes "false" update events as data comes into the model asynchronously, the short-circuited facet emits update. My next step is to wrap that with a filter (could be Rx, a promise, another emitter, it doesn't matter) that holds previous value (or just noops when the data is empty). If the new value differs, it's an update, otherwise don't emit. This saves downstream calculations and possibly layout thrashing while views try to render.

I guess my point is that the wrapping of cursors might seem obvious to one person but holds lots of nuance for someone else based on their implementation. I like having small, powerful building blocks. If you want to extend this, I applaud you but I'd hope that you use a plugin/adapter system rather than change what's currently working.

Yomguithereal commented 9 years ago

So you'd prefer the v1 current system for facets over the v2 one then? The thing is I am currently stalling v2's release not to rush anything concerning facets and be sure everything was made correctly. The only thing that v2 changes concerning facets is that they now sit within the tree itself rather than in their own compartment. But this is the main point that should be solved here.

AutoSponge commented 9 years ago

@Yomguithereal when I'm done with this project, I'll change to v2 and try it out. From personal experience, having facets segregated in the model definition and clearly separated in views/controllers made it easy for me to inform other developers which data points were read-only and which could be updated by the controller.

Yomguithereal commented 9 years ago

@AutoSponge, this problem should be partially solved by the fact that, by convention, computed nodes in the tree should have a key starting by $ so that, by looking at the key, there should not be any confusion. But I agree this is somewhat not perfect.

oresmus commented 9 years ago

Thinking more about this, and after reading the subsequent comments, I want to say some things on the other side of what I said earlier. (My true position is somewhere in between -- I'm still trying to synthesize all this.)

There are different opinions about how to design a specific interface which one could derive from all this, and I'm not yet ready to argue for or against anything specific. In fact, I would need more experience actually using these things before I should try to argue anything like that.

digression: another system of interpretive dependency tracking

I do have some experience implementing and using a different style of this kind of thing (in NanoEngineer-1 and in some personal programs). That was a system in which you can compute any expression of definitive variables, and it will automatically track which ones were used, and let you subscribe to the first future change to any of those variables, so you know when a recompute might be needed -- you use this to maintain a "dirty" flag on each computed value, used to recompute it as needed before returning from each access.

This can be made very efficient in terms of number of updates -- a computed value marked dirty stops listening to changes to anything, but is not recomputed until it's next needed. So total time is proportional to rate at which things need to be recomputed (since at least one input changed), times average number of variables they depend on (and those can be other computed things). In particular, in any one "update cycle" (analogous to an "animation frame" in a browser or game) nothing can be recomputed more than once, even if there are many dependency paths between it and things which changed, and nothing can be recomputed unless it was both needed and some of its ultimate dependencies were different.

But it has some efficiency problems too, some obvious and some not -- this is getting kind of long, so I'll leave out the details.

In ease of writing correct code, if you use it correctly it's very good -- no need to declare specific dependencies, and no need for them to be the same (for any one computed variable) on each update cycle. But if you use it wrongly it can be hard to debug.

I learned about that scheme from a Lisp UI system made at CMU in the 80's, called Garnet. (That scheme itself was from an even older subsystem called "KR" which stood for "Knowledge Representation" and was someone's thesis.) It doesn't seem to be widely known.

I have thought of some efficiency improvements to that scheme, and implemented some but not others. Ultimately I think a thing like this should have compiler support. I also think it needs more flexibility. All that leads me in the direction of FRP rather than continuing to try to do this kind of thing interpretively.

But for a typical web app, as opposed to a CAD program, the computation overhead for this might be trivial compared to the DOM updates and the programmer time, so the CPU time inefficiency of interpretive dependency-tracking might be a nonissue. Thus something like Baobab, which I see as packaging up a specific (simple but useful) kind of dependency tracking, might be very good.

(I am interested in Cerebral too, but have not looked at it closely enough to make any comparison.)

Yomguithereal commented 9 years ago

Thanks again for your insight @oresmus. This is very interesting. This is true that this kind of dependency system introduces performance issues to tackle but I hope I have achieved a reasonably performant implementation for Baobab.

On a practical level therefore, do you think computed data nodes should indeed enter the tree as this is planned and currently implemented for v2, or do you think a further separation of concern should be kept?

oresmus commented 9 years ago

You're welcome @Yomguithereal, and of course thanks for providing Baobab which is both useful and discussion-stimulating. But as for the question about which specific interface is better for Baobab (v1 or v2 or something else), I wish I could give advice on that, but I have not yet used it in practice or even fully read the documentation -- I am still in the process of deciding which tools/libraries to use for my first web app, and it seems like I am finding more things to investigate every day. So I don't think I can give proper feedback on that, compared to actual users. If I become one, I won't hesitate to provide more pointed feedback!