dtao / lazy.js

Like Underscore, but lazier
http://danieltao.com/lazy.js/
MIT License
6.01k stars 267 forks source link

Something to consider #1

Closed jdalton closed 11 years ago

jdalton commented 11 years ago

The cost of something like this can be pretty high for smaller collections:

Array of 20 capture1

and gives mixed results as the array length increases.

Array of 100: capture

I dig what you're trying to do, however it may be better accomplished through method compilation. For example checkout heya/pipe.

dtao commented 11 years ago

Thanks for taking a look, and for pointing me to heya/pipe! It looks like a cool library, and the jsPerf results are certainly impressive.

For what it's worth, I think the approach I'm dabbling with in lazy.js does have some advantages; so I plan on continuing to work on it at least for a little while (though I'm sure the way things are implemented right now are far from optimal and deserve several iterations of reconsideration).

  1. Probably most importantly, I like that lazy.js provides an intuitive interface that's essentially identical to underscore's and should jive with what many JS devs (not to mention Ruby, C#, etc. developers) already expect. It seems that pipe has a very different model, which at least on first glance strikes me as a bit more opaque and unusual.
  2. Also, though it's a very small part of the library right now I think there's a lot of power in the notion of arbitrary sequence generation, which I want to experiment with a bit more. As just a couple examples of what I'm thinking: one possible consequence of lazy's approach might be the feature of trivially exposing any "sequence" (backed by a collection or not) with an asynchronous iterator. And subsequent to this might be a way to leverage lazy to transform and filter over not just sequences of elements but of events as well.

That line of thought might be a bit "out there" (obviously I haven't thought it through fully yet); but I'm nonetheless optimistic there's the seed of something very useful here.

As for the performance of lazy.js on small- to medium-sized collections: that's definitely not surprising, though I appreciate you pointing it out. And if this library ever really gets off the ground I'll be sure to make very clear in the docs that its strength isn't in having fast map or filter functions per se (it was very surprising to me that in Chrome, at least on my machine, even the plain vanilla map and filter functions were faster using lazy than underscore—though clearly not faster than lodash, which I didn't realize was so far ahead!). I do think it's realistic that after optimizing some more, it can be reliably much faster on very large collections when composing "full-list" iterators with "partial-list" ones (e.g., map + take).

Here's the same jsPerf test, but only with the take examples, for 1000 elements.

lazyvslodash

So anyway, that's where my thoughts are so far. If you feel I'm still overlooking some really important (and possibly obvious) things, please let me know!

jdalton commented 11 years ago

I dig the way you can short circuit long linear iteration, but it penalizes the more common case. Lo-Dash has optimizations for large arrays in some of its methods but doesn't make it a focus and instead optimizes for the common case. Devs dealing with large arrays shouldn't try to use their old sugary API and expect it to perform equally well. It's better for devs to rethink, strip down (reduce abstraction), and simplify to improve performance.

jdalton commented 11 years ago

Here is an example of simply approaching the issue differently (It could be reduced further): capture

dtao commented 11 years ago

I really appreciate you engaging in this conversation, not only because of your points themselves (which are good), but also because it's helping me to crystallize my own ideas about why lazy.js is still worthwhile.

Two things. First, though I appreciate that there will always be faster ways to do basically anything you could do with lazy.js (or any library), I don't agree that the right approach is therefore to always strip away abstractions, even when performance is a concern. Hand-tuning highly specific code that does exactly what you need is going to beat using a more general-purpose library every time; but the result is likely to be less reusable, probably more complicated (and so trickier to modify), and at least a bit harder to read and comprehend. It also just takes longer to build software this way. So I think choosing a tool is a matter of weighing various trade-offs; in the case of a JS application that deals with very large sequences and interacts with those sequences in a variety of ways, lazy.js just might prove an attractive trade-off: keeping the familiarity, readability, and composable nature of e.g. underscore, whlie performing fast enough to outweigh the cost of writing many special-case functions instead.

But more importantly—and this is my own fault for writing the README in a way that pits lazy.js against underscore/lodash (I will definitely be rewriting that soon)—if we're disagreeing on a good approach for a library that provides utility methods for iterating over arrays, then honestly, I must say you win, in spite of my previous point. The more I think about it, the more I think these libraries shouldn't really be pitted against each other at all. Their responsibilities may intersect, but they aren't the same. The real value of lazy.js (as I'm coming to think of it, anyway) is the shift away from collections (many elements in some sort of container) to iterators, or "views" into collections.

As a concrete example: I just added the sortedIndex method. Now supposing I had a sequence of people, and I wanted to find someone with the last name of Smith, I could do:

Lazy(people).map(getLastName).sortedIndex("Smith")

You wouldn't want to do it that way using underscore (the map part would be inefficient); so you'd do something like this instead:

_.sortedIndex(people, smith, getLastName)

But my point is that using lazy.js, the map().sortedIndex() route is a fine way to do it. Because map doesn't create anything eagerly. This also makes it very conducive to mixing and matching with special-case code. For example say I have some complicated logic that I want to run to analyze all of these people, and it deals with names but it can exit early. I could first map(getName); but that wouldn't be efficient. I could extract the names within the actual logic that does the analysis; but that would add "noise" to code that's really about doing analysis, not extraction. Using lazy.js I could accept a sequence for analysis, pass in map(getName) and call each on it. Now I've got the cleanliness of the first approach but with better efficiency (though admittedly not as efficient as the second).

Sorry, this is probably more than you cared to read. It was helpful for me to think through some things, at least. You could be right and I may end up throwing this idea away completely; but for now I still think there's some life here.

jdalton commented 11 years ago

Being "fast enough" is tricky. You don't know what kind of device code will run on or what kind of code will be built on top of it. In non-large array cases your speed of light is slower. I still think it's reasonable for devs to approach perf conditions on a case by case basis and can see devs using something like

var ret = _.transform(array, function(result, value) {
  value = inc(square(value));
  return !isEven(value) || (result.push(value), result.length < 4);
});

if and when perf is an issue, over the chained syntax.

dtao commented 11 years ago

I guess there's two conversations here. One is about developing this library specifically (and the wisdom or lack thereof in doing so); the other is about the right way to deal with performance issues. I suspect we're nearly on the same page in the latter discussion.

I think you're trying to spare me from wasting a lot of time on a bad idea (and perhaps future developers from choosing a bad tool). To that extent I appreciate your side. If I seem to be pushing back, I guess it's partially because I don't think performance is the only interesting issue here. But it's also because I'm really just starting out on this, and I think even if we DO focus on performance there are still big improvements I can make to this approach.

I mean, stepping back a bit and looking at this pretty generically: if we compare two libraries on performance, and one has been around a long time while the other is brand new, I would expect the new one to lose in many cases, though maybe win in some. I wouldn't expect this to provide much conclusive data on the inherent potential of either library's approach, as the new one is probably going to go through many revisions still. So from that angle I continue to respectfully disagree with what I think is your opinion, that I should stop working on this.

Even if you're right, this work might prove useful down the road: to illustrate what NOT to do! We'll see.

jdalton commented 11 years ago

I think you're trying to spare me from wasting a lot of time on a bad idea (and perhaps future developers from choosing a bad tool).

Not trying to spare you. Just wanting you to have a better picture and give you something to shoot for.

If I seem to be pushing back, I guess it's partially because I don't think performance is the only interesting issue here. But it's also because I'm really just starting out on this, and I think even if we DO focus on performance there are still big improvements I can make to this approach.

I agree there is more interesting things here than perf.

I mean, stepping back a bit and looking at this pretty generically: if we compare two libraries on performance, and one has been around a long time while the other is brand new, I would expect the new one to lose in many cases, though maybe win in some. I wouldn't expect this to provide much conclusive data on the inherent potential of either library's approach, as the new one is probably going to go through many revisions still.

Not necessarily, when Lo-Dash launched it was faster than Underscore, had more bug fixes, and better cross-environment support. And it was because I had a better approach, the advantages were obvious and tangible from the start. That said, I did refine things as time went on, so with that in mind I'd like to see this lib continue to be optimized/refined as you explore this approach. Keep it up!

dtao commented 11 years ago

Thanks for the encouragement :)

Your experience with Lo-Dash demonstrates that it's possible to confirm that an idea has merit early on (even compared against a more mature library); but I would argue you can't as easily refute the merit of a new library. I.e., it's always easier to prove a positive (this can work) than a negative (this isn't going to work).

I've done a bunch of refactoring to minimize the overhead of Lazy's approach (i.e., expensive object creation). Now even on small arrays its performance is comparable to even Lo-Dash. My own tests show Lazy is faster on Chrome and Opera, slower on Firefox (haven't yet tested on IE or Safari). Take a look at the jsPerf now with the latest lazy.js code:

lazyjsperf

I will add that I realized there was a problem with comparing the performance of toArray() to underscore and Lo-Dash: in the majority of cases, you wouldn't call toArray() at all using Lazy. Instead, you would iterate over the sequence itself, which is sort of the whole point of the library. So to make the scenarios more realistic I did a comparison of iterating over a Lazy sequence vs. iterating over the array created by Lo-Dash. On first glance I realize this might seem unfair to Lo-Dash, but I think it actually makes perfect sense since, unless you're going to iterate over the array somehow, what was the point in creating it?

Let me know if you disagree or—again—feel I'm missing something obvious. But my progress thus far has only further convinced me that this library is totally worthwhile.

jdalton commented 11 years ago

I've done a bunch of refactoring to minimize the overhead of Lazy's approach (i.e., expensive object creation). Now even on small arrays its performance is comparable to even Lo-Dash. My own tests show Lazy is faster on Chrome and Opera, slower on Firefox (haven't yet tested on IE or Safari).

Awesome!

I will add that I realized there was a problem with comparing the performance of toArray() to underscore and Lo-Dash: in the majority of cases, you wouldn't call toArray() at all using Lazy.

Ahh ya that is a bit different. I think accounting for toArray is fair as both results would be apples to apples (both returning an array you can work, independent of a lib).

dtao commented 11 years ago

Well, FWIW, at least in this specific jsPerf example, Lazy still holds up quite well even when you call toArray():

lazyjsperf2

jdalton commented 11 years ago

Ya, though I think your benchmarks you're using to promote the lib should be adjusted as well because not calling toArray is essentially measuring a no-op, right?

masaeedu commented 6 years ago

The fundamental advantage of the library is the increase in the number of no-ops for equivalent computations.