caolan / highland

High-level streams library for Node.js and the browser
https://caolan.github.io/highland
Apache License 2.0
3.43k stars 147 forks source link

How to sell coworkers on FRP\Highland? #565

Open jaidetree opened 8 years ago

jaidetree commented 8 years ago

So at work I've been trying to sell the rest of the tech team on FRP & libraries like Highland to make it easier to accomplish harder tasks.

I found the perfect problem: If an event date is set in the future, how can we gradate the color between set intervals until then? For example we may want a teal-ish color when there are 48 hours or more until the event. Or a mix between teal & yellow when between 48 hours and 6 hours, and a mix between red & yellow when the time is less than 6 hours from the event date which is red.

Fortunately I was able to get it done in highland but looking at a comparison against one a coworker wrote in vanilla ES6 js, it feels like mine is a lot more verbose & makes less sense. Would anyone here have modeled it differently or is it that my grasp of FRP concepts is severely lacking?

Coworker's solution:

const COLORS = [
  { stop: 0, rgb: [35, 198, 161] },
  { stop: 6, rgb: [245, 235, 73] },
  { stop: 48, rgb: [245, 51, 0] },
];

function getRGBColor (hours) {
  let min, max, normalizationFactor;

  for (let i = 0; i < COLORS.length; i++) {
    if (hours <= COLORS[i].stop && hours >= COLORS[i-1].stop) {
      max = COLORS[i];
      min = COLORS[i-1];
      break;
    }
  }

  normalizationFactor = (hours - min.stop) / (max.stop - min.stop);

  return min.rgb.map((c, i) => {
    let rgb1 = min.rgb[i],
        rgb2 = max.rgb[i],
        normalizedDiff = (rgb2 - rgb1) * normalizationFactor,
        rawValue = parseInt(rgb1 + normalizedDiff, 10);

    if (rawValue > 255) return 255;
    else if (rawValue < 0) return 0;

    return rawValue;
  });
}

My highland\FRP solution:

const COLOR_TABLE = [
  {
    rgb: [245, 51, 0], // red
    stop: 0,
  },
  {
    rgb: [248, 235, 73], // yellow
    stop: 6,
  },
  {
    rgb: [44, 163, 178], // turk-like colors
    stop: 48,
  },
];

function calcGradient (input) {
  return _([ input ])
    // Calculate the duration & hours of the time between now & the event
    .map((data) => {
      data.duration = data.eventDate - data.now;
      data.hours = Math.ceil(data.duration / 1000 / 60 / 60);
      return data;
    })
    // Get the two colors we need to switch to
    .flatMap((data) => {
      return _(COLOR_TABLE)
        // Takes colors until we no longer match their stop
        .reduce({ done: false, colors: [] }, (reduction, color) => {
          // A very cheap way to not use an if statement :D
          !reduction.done && reduction.colors.push(color);

          reduction.done = data.hours < color.stop;

          return reduction;
        })
        // grab the colors property from the data object
        .pluck('colors')
        // Ensures there are always at least 2 colors in the event we're still
        // before 48 hours or after the event has occurred
        .map((colors) => [colors[0]].concat(colors))
        // Only use the last two items.
        .map((colors) => colors.slice(-2))
        // Merge in our colors back into mainstream's our data
        .map(([colorA, colorB]) => Object.assign(data, {
          colors: {
            prev: colorB,
            next: colorA,
          },
        }));
    })
    // Get our progress factor to so we know where to transition between the
    // two colors
    .map((data) => {
      let { hours, colors: { prev, next } } = data;
      let normalProgress = (hours - prev.stop) / (next.stop - prev.stop);

      return Object.assign(data, {
        progress: Math.min(Math.max(normalProgress, 0), 1),
      });
    })
    // Find a color that is transitional between the two colors
    .flatMap((data) => {
      let { colors: { prev, next }, progress } = data;

      return _(( prev.rgb ))
        .zip(next.rgb)
        .map(([ start, end ]) => start + ((end - start) * progress))
        .map(Math.floor)
        .collect()
        .map((gradient) => ({
          gradient,
          hours: data.hours,
          time: data.now,
        }));
    })
}
vqvu commented 8 years ago

Sorry for the late reply. This is a tricky question to answer.

I think there are two questions here:

  1. Why is the Highland solution so much more complicated than the vanilla JS solution?
  2. Can we show that there is an advantage to modeling this problem in FRP?

I'll address (1) first. You should be careful about writing code that is too functional. Javascript isn't a functional language, and it's perfectly fine to combine some imperative code when working with Highland if it helps readability. Trying to force imperative code into a functional style, which is what happened here, can very easily hurt readability. Such code tend to have the property of being perfectly understandable to the writer when he or she is writing it, but extremely obtuse to everyone else.

FRP is more about transforming an asynchronous dataflow (i.e., streams of events or data objects) via functional programming constructs. calcGradient is neither asynchronous nor does it operate on a stream of data, so Highland is not a good fit for it. It's probably best implemented imperatively like in your coworker's solution.

Which brings us to (2). How can Highland help with the overall problem of outputing a gradient given a timestamp? For that we need to think about the entire dataflow end-to-end rather than just a simple time to gradient mapper.

In my opinion, the main benefit of Highland and similar libraries is the decoupling between a dataflow transform and the actual dataflow. In Highland, a transform does not have to worry about where its input data is coming from or what to do with the resulting data. That fact makes it easy to write and compose independent transforms together. To achieve the same thing in vanilla JS, you typically have to resort to callbacks everywhere.

To illustrate this, let's say you want to print a message every X milliseconds. You may end up doing something like this.

// Vanilla JS
function computeMessage(eventTime, now) {
  const hours = toHours(now - eventTime);
  const color = getRGBColor(hours);
  return `${color} ${hours} hours until event`;
}

function execute(eventTime, ms) {
  setInterval(() => {
    console.log(computeMessage(eventTime, Date.now()));
  }, ms);
}

execute(new Date('2017-01-01').getTime(), 1000 * 60);

// Highland
function timer(ms) {
  _((push, next) => {
    push(null, Date.now());
    setTimeout(next, ms);
  });
}

function computeMessage(eventTime) {
  return now => {
    const hours = toHours(now - eventTime);
    const color = getRGBColor(hours);
    return `${color} ${hours} hours until event`;
  };
}

timer(1000 * 60)
    .map(computeMessage(new Date('2017-01-01').getTime()))
    .pipe(process.stdout);

So far not much different. But now you want to write the message to both stdout and a database. In vanilla JS, you might add a call to write to the db in the setInterval handler. In Highland, a simple fork will do.

// Vanilla JS
function execute2(eventTime, ms) {
  setInterval(() => {
    const message = computeMessage(eventTime, Date.now());
    console.log(message);
    db.insert(message);
  }, ms);
}

execute2(new Date('2017-01-01').getTime(), 1000 * 60);

// Highland
const s = timer(1000 * 60)
    .map(computeMessage(new Date('2017-01-01').getTime()))

const fork1 = s.fork();
const fork2 = s.fork();

fork1.pipe(process.stdout);
fork2.each(message => db.insert(message));

Now you really want to batch inserts to the database for performance, so you put in some simple batching logic. In Highland, it's just a call to batch.

// Vanilla JS
function execute3(eventTime, ms, n) {
  const batch = [];
  setInterval(() => {
    const message = computeMessage(eventTime, Date.now());

    console.log(message);
    batch.push(message);
    if (batch.length > n) {
      db.insertAll(batch);
      batch = [];
    }
  }, ms);
}

execute3(new Date('2017-01-01').getTime(), 1000 * 60, 100);

// Highland
...
fork2
    .batch(100)
    .each(messages => db.insertAll(message));

Now it's getting more complicated. You have batching logic intertwined with scheduling logic intertwined with (hard-coded) output logic. That's not great. What if you want to be able to choose where to write to? In vanilla JS, you have to resort to using adhoc callbacks everywhere. In Highland, the transforms are already split up so you only have to change where you pipe.

// Vanilla JS
function execute4(eventTime, ms, callbacks) {
  setInterval(() => {
    const message = computeMessage(eventTime, Date.now());
    callbacks.forEach(cb => cb(message));
  }, ms);
}

function batch(n, cb) {
  const batch = [];
  return message => {
    batch.push(message);
    if (batch.length > n) {
      cb(batch);
      batch = [];
    }
  }
}

// Let's write to a file instead.
const fileStream = fs.createWriteStream('a file');
execute4(
    new Date('2017-01-01').getTime(),
    1000 * 60,
    [
      message => fileStream.write(message),
      batch(100, messages => db.insertAll(messages))
    ]);

// Highland
...
fork1.pipe(fs.createWriteStream('a file'));
...

Notice how the vanilla JS version is slowly evolving to look like the Highland version but more nested. Eventually you'll either reinvent a Highland-like framework or you'll end up with callback hell.

And this is just from your example. Here's another example that better illustrates the benefit of Highland. What if you want to scrape a bunch of webpages and write them to a database, but you don't want to DOS the server with too many requests at once.

In Highland, that's just

function getPage(url) {
    // returns a promise with the body.
}

_(urls)
    .map(url => _(getPage(url)))
    .mergeWithLimit(5)  // 5 requests in-flight at a time.
    .each(body => db.insert(body));

In vanilla JS, it's far more complicated since you need to reimplement the equivalent of mergeWithLimit. I won't write it out fully here, but I'm sure it's more than 4 lines.

jaidetree commented 8 years ago

No apologies necessary. First, I realized after posting that this may have been more suited as a StackOverflow post than a support ticket that project collaborators, such as yourself, are alerted to. Second, it was not immediate and was intended to be an open discussion topic that perhaps others who may also be learning these concepts are trying to get a feel for. To get any insight into this is vastly appreciated.

That said, this example is very helpful in understanding the intended application of Highland and what problems FRP are best suited to solve. Seeing how complex and error prone that vanilla example becomes followed by how easy it is to model in Highland can be used to much better illustrate how it can help in our projects at work.

As for your first point, that is a conclusion I came to as well having set out to refactor it to see how much the original FRP solution could be refactored. It seemed like the bulk of the verbosity was trying to transport too much data between each stream in the pipeline. Once I started simplifying the types of data going down the streams it became much simpler.

To the extreme I was able to take it down to 16 lines of code. But this felt like it was a little too dense to best demonstrate the scalability & composability that FRP has to offer:

function calcGradient (hours) {
  return _(COLORS).reduce({ done: false, colors: [] }, (loop, color) => {
    !loop.done && loop.colors.push(color);
    return { done: color.stop > hours, colors: loop.colors.slice(-2) };
  })
  .pluck('colors')
  .flatMap(([c1, c2]) => _(c1.rgb).zip(c2.rgb).map(([x1, x2]) => [
    x1, x2, Math.max((hours - c1.stop) / (c2.stop - c1.stop), 0),
  ]))
  .map(([c1, c2, factor]) => Math.floor(c1 + ((c2 - c1) * factor))).collect();
}

After expanding that out to emphasize simplicity & readability the solution can be expressed as:

function calcGradient (hours) {
  let done = false;

  return _(COLORS)
    .reduce([], (colors, color) => {
      !done && colors.push(color);
      done = color.stop > hours;
      return colors.slice(-2);
    })
    .map(([end, start]) => ({
      rgb: [start.rgb, end.rgb],
      factor: Math.max((hours - start.stop) / (end.stop - start.stop), 0),
    }))
    .flatMap(({ rgb, factor }) => {
      return _(rgb[0]).zip(rgb[1]).map((values) => ({ values, factor }));
    })
    .map(({ values, factor }) => {
      return Math.floor(values[0] + (values[1] - values[0]) * factor);
    })
    .collect();
}

Which is definitely comparable to the refactoring a coworker did a day or so later. I even made a small running project out of it to rewrite it in RxJS, Transducers-js, and combining Highland with Transducers-js in addition to transpiled languages like ClojureScript.

So far only the vanilla, FRP - Highland, and RxJS examples are complete but they are documented here for now:

https://gist.github.com/jayzawrotny/3079b17004a5237910a58f877bac1d58

vqvu commented 8 years ago

this may have been more suited as a StackOverflow post than a support ticket that project collaborators, such as yourself, are alerted to.

Oh, I don't really mind. I like answering these kinds of questions, and in the absence of some sort of general highland-users mailing list, I think Github issues are a fine place for them, especially if you want a Highland collaborator to weigh in. Speaking for myself, I don't check StackOverflow for Highland-related questions, so I would never have seen this question otherwise.

It seemed like the bulk of the verbosity was trying to transport too much data between each stream in the pipeline.

Yes. This is a much more concise explanation of the problem than what I said. I agree completely.

I like your new solution. Definitely much cleaner. Although I'm pretty sure that the result of the reduce is [start, end] and not [end, start] like in your example. Also, if you want to accomplish the same thing without the external done variable,

function calcGradient(hours) {
  return _(COLORS)
    .reduce([COLORS[0], COLORS[0]], (colors, color) => {
      if (color.stop <= hours) {
        return [color, color];
      } else if (colors[1].stop <= hours) {
        return [colors[1], color];
      }
      return colors;
    })
    .map(([start, end]) => {
      ...
    })
    ...
}

As a side effect, it works well for hours < 0 and hours >= 48.

o0x2a commented 8 years ago

@jayzawrotny I can talk on behalf of your coworkers who do not buy your Highland solution.

  1. It is not FRP.
  2. It is not very readable.
  3. It is not maintainable.
  4. It is not easily debuggable.
  5. It can be much slower than not using highland.
  6. Your introducing a 3rd dependency which is absolutely NOT need, and does not improve the solution.
  7. It is very opinionated.

There are scenarios in which using Highland can be a bliss, a perfect solution. But definitely it is not something to bring into the mix for every solution. If your coworkers have objections about you using Highland, they may have a point. Being a "Smart pants developer", is mostly equivalent to a "Bad developer".

Please use Highland wisely, and when it is applicable, and avoid it whenever there is no need for it. The perfect use case of Highland is when you want to handle/modify multiple Stream (actual async Node Stream) on the fly (e.g. converting a never ending stream of binary data from a socket, and output convert hex string to another socket)