dc-js / dc.js

Multi-Dimensional charting built to work natively with crossfilter rendered with d3.js
Apache License 2.0
7.42k stars 1.81k forks source link

dc inside webworker #602

Open arunsoman opened 10 years ago

arunsoman commented 10 years ago

will dc work inside a webworker ?

gordonwoodhull commented 10 years ago

I guess the request here is https://github.com/square/crossfilter/issues/53

The interface between dc.js and crossfilter is pretty narrow, so I think you could probably run crossfilter (the computationally expensive part) in a web-worker, and marshall the requests/responses across.

You would end up wrapping the crossfilter objects like the "fake groups" described in the FAQ. Besides the group's all and top functions, you'd also have to wrap the dimension's filter function(s).

Pure speculation... this is a long-term requirement in our project too, so I've thought about this but haven't tried anything.

mathiasleroy commented 9 years ago

Here is my progress. Maybe it will be useful to someone.

What worked so far:

1 [web worker] load csv (with d3.csv) 2 [web worker] recode the data (with data.forEach) 3 [web worker] Send this data back to the main script (self.postMessage(data)) 4 [script] have the rest of the script running normally (dimension, charts, data-count, table, etc.)

Now I'm trying to have the web worker make the crossfilter dimensions too. Dimensions are created. I just can't send them back to the main script.

Uncaught DataCloneError: Failed to execute 'postMessage' on 'WorkerGlobalScope': An object could not be cloned.

I've read that DataCloneError is from trying to pass an object with methods.

Trying with JSON.parse(JSON.stringify(dimensions))

Uncaught TypeError: Cannot read property 'top' of undefined

I'm searching to wrap the functions like gordonwoodhull said.

gordonwoodhull commented 9 years ago

I don't think you will be able to send any objects with methods across the wire, because functions and object references aren't valid JSON data. You'll have to mirror the objects on the client side.

I recently became aware of this project, which puts crossfilter on a Node server: https://github.com/ZJONSSON/crossfilter

I haven't had time to figure out how it works, but it's basically the same problem. Again, this is really a crossfilter feature request, not dc.js.

gazal-k commented 8 years ago

@esjewett: is this, https://github.com/crossfilter/crossfilter-async, a solution to this issue?

esjewett commented 8 years ago

@gazal-k The idea is that it will be, or that it will serve as the basis of an option. Right now, crossfilter-async has a significantly different API than crossfilter. All the operations are async, usually returning promises, and values are returned as promises. So it is not a drop-in replacement by any means. That said, it does run all Crossfilter operations in a Web Worker.

gazal-k commented 8 years ago

:+1: That's great. Have you tried using it with dc.js ?

esjewett commented 8 years ago

No, it won't work at the moment because dc.js expects values, not promises. But I plan to wrap it in a layer that works with dc.js.

gazal-k commented 8 years ago

Ya, I kinda gave it a test drive and that's what happened. Anyway, keep me posted, I'd love to help in any way I can :smiley:

iwasaki-kenta commented 8 years ago

Is there any updates on this? Would really love to get these charts running asynchronous on a webworker to create an admin panel for a site.

gordonwoodhull commented 8 years ago

Hi @Dranithix - if I am not mistaken we have all the building blocks.

You'd "just" have to hook up the promises supplied by crossfilter-async, to the commitHandler recently added to dc.js, which allows asynchronous applying of filters and retrieving of data.

The tricky thing here is creating fake groups to mirror the data on the client side, and a little bit of glue to fetch each of the groups each time the filters are applied.

I say "just" because this is all a little bit mind-bending, but it should be straightforward in practice.

I'll try to work up an example when I have time.

esjewett commented 8 years ago

@gordonwoodhull Oooooohhhhh, that would probably actually work! Intriguing.

One thing to keep in mind is that crossfilter-async is very green (read untested and not used in production anywhere) and a bit limited because it has to serialize all functions, so you can't store references outside the scope of the function.

If you folks have issues with crossfilter-async, let me know over in the issues. I can't promise to be able to fix anything immediately, but I will be able to merge pull requests pretty quickly as long as they contain tests.

gordonwoodhull commented 8 years ago

Well, I couldn't resist. 😉

The above commit lays out the basics, and everything worked right out of the box. However, I ran into the expected problem that you can't marshall functions with closures across the boundary to a webworker.

So the example works fine until you try to filter two pie slices or two row chart bars. Then dc.js attempts to apply a filterFunction and I didn't try to implement that function on the "holders" (which I now think should be called "buffers") because I know it won't work.

Startup can be slow (~5s sometimes), but I didn't notice any performance problems (for this tiny data set) when running the chart.

What I think we'll need to do is:

So, for this to work in general, there are a couple of small changes that need to be made to dc.js and crossfilter-async, a bit more work with dc.filters, and I think we'll want a little "async buffers" library. But no backward-compatible changes.

gordonwoodhull commented 8 years ago

@Dranithix, if you only need one-value and range filtering, you could roll with the example above. I'm not going to release it yet because of these problems, but it's currently on a branch.

esjewett commented 8 years ago

Very cool. There is room for a lot of optimization in crossfilter-async, much of which was already done in the lcadata.info code, but needs to be redone in a cleaner way in crossfilter-async. I suspect that would help with the slowness issues. If someone has time to look at it, feel free. Otherwise I hope to be able to look at it within a month or so.

On the filter issue: One approach that I found helpful is that if you know the structure of the function being passed to the filterFunction, you can actually convert it to a string, replace the variables from outside the function scope with their values, then convert the string back into a function. Or better, crossfilter-async could accept a string argument to filterFunction and pass the string directly over the to the web worker. There's no performance hit there because the function -> string -> function serialization/deserialization is what happens anyway in order to send the function over to the web worker.

Similar approach is used here, except that instead of replacing the variable with its value directly in the function, the web worker takes a separate code snippet to execute in order to populate the execution scope with the variable and value the filter function will be looking for. Not sure which is better: https://github.com/esjewett/lcadata/blob/master/src/main/webapp/js/DataApp.js#L51

gordonwoodhull commented 7 years ago

Returning to this because of an SO question asking about commitHandler.

The function which dc.js creates can't be automatically serialized because it relies on a closure and/or the function isFiltered in the filter objects.

It's not hard to capture these cases in a filterHandler, but there's no server-side or webworker crossfilter which has those filters defined. That would not be difficult, either, but there's some loss of generality.

esjewett commented 7 years ago

Would it make sense to try to add these filter types to Crossfilter directly? Perhaps initially implemented as a wrapper of filterFunction, but hopefully eventually with a more efficient implementation? Then dc.js could do away with the need for these closures.

gordonwoodhull commented 7 years ago

I'm not sure that the dc.js filter types are the right canonical ones (or necessarily correct in all cases). For example, they enshrine array keys, causing implicit coercion deep inside crossfilter. And there are always uses for custom filters.

Instead, we might pass a second order function along with its data. So for the simple case of a filterSet, we'd pass e.g. the function

function(data) {
  return function(key) {
    return data.indexOf(key) !== -1;
  };
}

and the data

['apple', 'orange', 'banana']

As long as we stipulate that the function can't rely on any closures, the function should serialize and deserialize okay.

Good for a webworker but probably not a solution you'd want to use for a crossfilter server. Sending functions from the client to be eval'd on the server sounds... risky.

esjewett commented 7 years ago

I think that's the maximally flexible approach, and would be one that could be implemented in dc.js alone. (It's what I did for the LCAData thing.) But I would suggest the longer-term solution would be defining more flexible types of filters on Crossfilter dimensions that take serializable data structures as their arguments.

esjewett commented 7 years ago

BTW - more up to date and much simpler example of dc.js with Crossfilter in a web worker here: https://esjewett.github.io/wm-eventsource-demo/

gordonwoodhull commented 7 years ago

Nice demo!