Gozala / reducers

Library for higher-order manipulation of collections
MIT License
180 stars 8 forks source link

wrap the result of all things in hub #45

Open Raynos opened 11 years ago

Raynos commented 11 years ago

It's weird to have my filter being called multiple times because there are two forks after it.

Especially when I merge those two forks later.

var a =["foo"]

var b = filter(a, function () { console.log("called twice"); return true })
var c = filter(b, function () { return true })
var d = filter(b, function () { return true })
var e = merge([c, d])
fold(e, function noop() {})

advantages

Gozala commented 11 years ago

Transformation form lazy pipelines and that is consistent regardless of the data structures those pipelines originate from. That's done intentionally for consistency. That being said, there is very simple mechanism to share transformation which is hub(input).

Also keep in mind that share by default would be a lot more painful since now you do have a way to share transformations, but if you invert default you'll loose laziness and won't be able to opt-out from sharing.

Raynos commented 11 years ago

@Gozala the thing is I call fold once and because I do fork and merge in between input and a single fold call the intermediate states get called multiple times.

I'm fine with "multiple calls to fold means multiple consumptions from input" I find the fact that the implementation detail of a transformation (i.e. does it fork and merge) can cause multiple consumptions from input

Gozala commented 11 years ago

I'm sure it's not the issues with fork and merge the problem is that different forks are fold and the fact that you forked them forth the same source doesn't mean anything. Think of it array.slice it makes a copy and there is no state sharing at all. And there are good reasons for that to be so because if you start sharing by default you'll miss a values in because they'll be all send before you'll have a chance to even start folding.

But yes I understand that it's different and can be confusing if not used to laziness.

Raynos commented 11 years ago

So it turns out that I didn't want something wrapped in hub and that I needed the lazy-ness shortly afterwards.

I think it is hard to get used to, but it should not be the default

Gozala commented 11 years ago

So it turns out that I didn't want something wrapped in hub and that I needed the lazy-ness shortly afterwards.

In general, I don't think you can implement monadic data structures without laziness, at least I can't see how you would.

I think it is hard to get used to, but it should not be the default

I don't think you can make hub default for reasons already stated in previous comments. I don't think you can implement monadic data structures without laziness. I could be wrong but I have no idea how to make what you're asking for. Problem has to trigger a pipe in our case it's fold / reduce. If you start sharing that means that first fold will trigger the flow causing subsequent folders to miss values.

Also note that hub is actually nasty and inpure. For exact same reasons, if you consume hub(input) there is no guarantee that consuming it today and tomorrow will have same results. So even hub should be used with great care and responsibility. As a matter of fact I like that property, because when you wrap something into hub you take responsibility and hopefully are aware of consequences.

Now all that being said, story could be greatly improved by documenting all of this. I've clearly have done poor job at this, but hopefully over time I'll manage to do that.

Strategy I'm trying to use is in my code is to avoid multiple folds of the same source, but I don't have a silver bullet how to do that. Here is a recent work I did with Gordon, which I think maybe little similar to what I think you're using at work, may help to inspire you: https://github.com/gordonbrander/rocket-bar/blob/master/index.js

Gozala commented 11 years ago

@raynos I'd also welcome you to try what you have in mind in the fork and see if that works. If there is a solution that's great, if not it'll definitely help to understand why things are the way they are.

Raynos commented 11 years ago

@Gozala documenting lazyness and the way inputs get consumed multiple times is a massive plus.

Well I avoid multiple folds too. But even then if the transformation contains merge(filter(x, b), filter(x, a)) it still consumes twice even with a single fold.

I'll mess with rocket-bar at some point. I feel like i'm reinventing parts of reflex at work :P

Gozala commented 11 years ago

Well I avoid multiple folds too. But even then if the transformation contains merge(filter(x, b), filter(x, a)) it still consumes twice even with a single fold.

Maybe we can address that specific case with another functions though. Although it's not clear why filter can be or(a, b) kind of thing.

Maybe merge can be somehow made intelligent, I'll try to keep it in the back of my mind.

Also I think elm does this differently: https://github.com/evancz/Elm/tree/master/elm Maybe we could learn from it.

Gozala commented 11 years ago

One thing that comes to mind is to have a source property on each transformation and then do than use that source in merge to detect inputs that share same source. Not sure how to use that information though.

Gozala commented 11 years ago

Alternatively could be fork and join functions:

var forks = fork(input, {
  foo: isFoo,
  bar: isBar
})

print(forks.foo)
print(forks.bar)

Where fork will make sure to hub input before creating filters. Although reason why never end up writing such fork function was the same as which why hub is not default. To be more specific second print will miss all the values from input in the same turn. But maybe somehow we could defer actual reduction of input for a single tick so that all the end's will be able to register handlers until actual values are pushed.

Gozala commented 11 years ago

Actually!! fork could delay reduction of input until every fork (or it's transformation) is passed to fold / reduce. That way it can be in the same tick but all of the forks will be guaranteed to receive all values.

Gozala commented 11 years ago

@raynos will that address your problems ?

Raynos commented 11 years ago

@Gozala the problem with print is that its a fold. your folding twice of course it breaks.

what you want is

var forks = fork(input, {
  foo: isFoo,
  bar: isBar
})

fold(join(introspect(forks.foo), introspect(forks.bar)))
Raynos commented 11 years ago

@Gozala it doesn't quite handle my issue because I want to fork the input later as well in a seperate function. That later fork will again cause an extra reduction of input even though fold is called once

Raynos commented 11 years ago

I think we should just look concretely at my example.

Gozala commented 11 years ago

@Gozala it doesn't quite handle my issue because I want to fork the input later as well in a seperate function. That later fork will again cause an extra reduction of input even though fold is called once

I understand, although you could do:

var forks = fork(input, { a: True, b: True })

And than pass froks.a and fork.b into separate functions rather than input instead. Of course you could just as well use hub instead, but notice that behavior will be different & you'll need to choose based on what you want behavior to be in your case.

I don't see how you can implicitly abstract time without being lazy, as a matter of fact that sounds like a foot gun. Since you won't have a clue what or when that would happen.

BTW you could always wrap the root of the input into hub for that matter your functions even could return such inputs, but that will obviously bring back timing constraints.

In other words I see following options:

  1. Share nothing so each read spawn whole new flow.
  2. Allow sharing of transformations, but also count readers to delay pushing data until all readers are ready to consume.
  3. Share and do not care if readers miss values.

I consider 3rd no go as it makes no behavior guarantees. 1st is a current case, but allows you to switch to 3rd in specific areas using hub. 2nd Can be also implemented with fork or something similar but either way there will have to be a way of knowing how many readers it needs to wait.

Maybe 2nd can be somehow made default to the reducers level, but I'm not sure it's a good idea, since it will cause whole new set of issues. For example if one of the transformation is never read nothing will happen at all.

It also maybe an option to share everything but delay all the reads from input by a tick hoping that by then all the readers are in place and if not, well too bad. That adds ambiguity and sounds too magical so I don't think it will make a best default.

Maybe you wanna explore any of this and see if it turns out to be better in practice.

Raynos commented 11 years ago

@Gozala atm I just do var app = App(hub(input)); fold(app, noop) and it bootstraps everything and only pulls from the input once.

Raynos commented 11 years ago

Actually no. I don't hub it!

I fixed my input to be weird >_<

Raynos commented 11 years ago
  1. is a no-go. I ran into that problem myself.

The problem with 2. is that I pull from input asynchronously when some other value comes in. So that's a race condition.

What I currently do is input is this

function createInput() {
  var current = {}
  startPolling()

  return reducible(function (next) {
    sendCurrent(next)
    sendNewData(next)
  })
}

The problem is that the input starts polling the async source before it's being folded. And the input doesn't stop folding if everyone sends the isReduced signal.

but this does mean that I can reduce the input as many times as I want without side effects.

The above example is like hub EXCEPT when you reduce it sends you a snapshot of the current state and then all changes.

So maybe I should hub the polling events and then have input be:

function createInput() {
  var input = PollerGuy()

  return reductions(hub(input), function () {
    // accumulate all the state stuff???
  }, {})
}

But then I only want to send you a snapshot of current state once you start reducing and then send deltas. I don't want to send you a stream of snapshots all the time.

What I really want is

function createInput() {
  var poller = createPoller()

  return merge([currentState(poller), hub(poller)])
}

i.e. the result I want to return is whatever the current state is when you reduce it followed by only the current events.

It's that currentState function which needs to be lazy that needs to be written.

Gozala commented 11 years ago
  1. is a no-go. I ran into that problem myself.

The problem with 2. is that I pull from input asynchronously when some other value comes in. So that's a race condition.

What I currently do is input is this

function createInput() {
  var current = {}
  startPolling()

 return reducible(function (next) {
    sendCurrent(next)
    sendNewData(next)
  })
}

The problem is that the input starts polling the async source before it's being folded. And the input doesn't stop folding if everyone sends the isReduced signal.

but this does mean that I can reduce the input as many times as I want without side effects.

The above example is like hub EXCEPT when you reduce it sends you a snapshot of the current state and then all changes.

Your description sounds like buggy version of buffer-reduce

Gozala commented 11 years ago

So maybe I should hub the polling events and then have input be:

function createInput() {
  var input = PollerGuy()

 return reductions(hub(input), function () {
    // accumulate all the state stuff???
  }, {})
}

But then I only want to send you a snapshot of current state once you start reducing and then send deltas. I don't want to send you a stream of snapshots all the time.

What I really want is

function createInput() {
  var poller = createPoller()

 return merge([currentState(poller), hub(poller)])
}

i.e. the result I want to return is whatever the current state is when you reduce it followed by only the current events.

It's that currentState function which needs to be lazy that needs to be written.

I think this is close to what we have being doing in reflex. Although if you remember I was suggesting to have an input of state snapshots rather then deltas and reasons are what you just described here. It's easy to calculate delta between two states (reflex could even optimized that specefic case), but all the deltas are not enough to calculate a complete state.

Your suggestion sound too magical, sometimes dispatch state sometimes dispatch delta is awkward IMO.

What's interesting though in reflex examples starting reductions with empty object was good enough. I wonder how is you're case different.

Raynos commented 11 years ago

@Gozala my approach is different actually.

I'm representing state of app as a flat list of objects with eventType: "add" and eventType: "remove". So currentState would return a list of all the current objects that exist and the rest would be deltas of new objects or object removal messages.

Sure I could blast snapshots but that's a bit annoying. I actaully want to only blast deltas, except the first value is a list of all deltas needed to get to current state, maybe that's a bad idea though.

Gozala commented 11 years ago

Sure I could blast snapshots but that's a bit annoying. I actaully want to only blast deltas, except the first value is a list of all deltas needed to get to current state, maybe that's a bad idea though.

That's a very definition of Hickeys complecting, but maybe it's not to bad in practice

Raynos commented 11 years ago

@Gozala it's similar to Property ( https://github.com/raimohanska/bacon.js?utm_source=javascriptweekly&utm_medium=email#property ). i.e. there's both a way to get current state (snapshot) and get updates to the thing.

I feel this notion of reducible which is event but also has a snapshot of current state is missing.

Gozala commented 11 years ago

No need to over think things:

{ state: { ... }, update: { ... } }

This is actually what State in reflex does with a diff that update and state is merged into same data structure and you can get update by calling diff on it.

Raynos commented 11 years ago

@Gozala but then I have this fugly api

map(thing, function (blob) {
  var current = blob.state
  var update = blob.update

  // do shit
})

instead of

map(merge([state(thing), thing]), function (update) {
  // do shit
})
Gozala commented 11 years ago

I like term fugly :D

Gozala commented 11 years ago

Anyway that's why I reflex did:

map(states, function(snapshot) {
  // do thing with a snapshot
})

If you need both state and delta you have:

reductions(states, function(current, previous) {
   var delta = diff(previous, current)

})

Maybe not ideal but worked fine in most cases I've tried.

I also considered this (but then thought it was ugly):

map(state, function(snapshot) {
  // if you need only delta then
  var delta = diff(snapshot)
})

Also it's fact that your map dependents on state snapshot at first is worrying map should be agnostic of that, if you need stateful transform you should use reductions instead, or if it's not enough some other flavor of the similar idea where both state and item is passed. We already exchanged some API ideas for that via gists, so I won't copy & paste them here, specially because it's hard to end up with nice API, so not having such functions forces to one to solve solution in less stateful manner.

Raynos commented 11 years ago

@Gozala reductions is a pain in the ass because I want two things.

I want to reduct and accumulate state between transformations and I want a result. The result of my transformation is different from the state I want to reduct.

I've just been storing state in closures for this and using expand to return some values sometimes and reduct state between transformations.

I think the real question is do I want to do transformations on streams of snapshots or streams of changes.

Gozala commented 11 years ago

I've just been storing state in closures for this and using expand to return some values sometimes and reduct state between transformations.

I think you're missing a point and there for having all this issues. Library is intentionally designed to make local state hard. It's fine to capture bindings from closures, but it's not to mutate them. Intention is that all the stateful code (updating any references or doing mutations) will go into source implementation or into consumer implementation. All the transformation logic should be state free and order independent since non of that is guaranteed. I think you'll have a lot better time if you don't try to workaround these limitations as they are guardrails.

As of reductions or writing var current = blob.state; var update = blob.update is that seriously such a big deal ? It just a little boilerplate that makes your code state and order independent & jit's can optimize better it's IMO totally worth it. Not to mention that if you really don't want to type those additional chars you could always use helper libs like this one: https://github.com/Gozala/extract

And in ES6 even use destructuring. At the moment you do bunch of tradeoffs just for a typing convenience.

Gozala commented 11 years ago

Here is yet another example how stateful transformation can be done when that is really necessary:

var LOCAL = 0
var ACCUMULATED = 1

function lift(input, f, start) {
  return reducible(function(next, initial) {
    return reduce(input, function transform(value, state) {
      var data = [].concat(f(value, state[LOCAL]))
      var local = data.pop()
      var result = state[ACCUMULATED]
      var count = data.length
      var index = 0
      while (index < count) {
        result = next(data[index], result)
        if (isReduced(result)) return result
        index = index + 1
      }
      return [local, result]
    }, [start, initial])
  })
}

// Transform any text stream into stream of lines
var lines = lift(text, function(chunk, prefix) {
  return prefix.concat(chunk).split("\n")
}, "")

The reason I hesitate to include this by default is that it's easy to include a new state in the end of the returned array.