Open doronrosenberg opened 8 years ago
How would that help get the values of the FreqItemset back to node? Don't we need to add a toJSON method to the FreqItemset to convert the object contents to JSON?
@billreed63 My thought was to do this in a general way that doesn't need to do JSON conversion. Basically the Node.js code would generate:
var collection = rdd.collect();
ourStringify(collection)
which would return:
[
{__eclairJSClass: mllib.fpm.FPGrowth$FreqItemset},
...
]
We can then create a FreqItemset object in Node that references collection[0], collection[1], etc. These would be remote proxies to the collection variable that holds all our FreqItemsets.
We would need to make a request across for each item, Can we just add:
FreqItemset.prototype.toJSON = function() {
var json = {};
json.freq = this.freq();
json.items = this.items();
return json;
};
and
List.prototype.toJSON= function() {
return this.getJavaObject().toString();
};
then
var x = JSON.stringify(result);
gives us:
[{"freq":3,"items":"[t]"},{"freq":3,"items":"[t, x]"},{"freq":3,"items":"[t, x, z]"},{"freq":3,"items":"[t, z]"},{"freq":3,"items":"[s]"},{"freq":2,"items":"[s, t]"},{"freq":2,"items":"[s, t, x]"},{"freq":2,"items":"[s, t, x, z]"},{"freq":2,"items":"[s, t, z]"},{"freq":3,"items":"[s, x]"},{"freq":2,"items":"[s, x, z]"},{"freq":2,"items":"[s, z]"},{"freq":2,"items":"[p]"},{"freq":2,"items":"[p, r]"},{"freq":2,"items":"[p, r, z]"},{"freq":2,"items":"[p, z]"},{"freq":5,"items":"[z]"},{"freq":3,"items":"[y]"},{"freq":3,"items":"[y, t]"},{"freq":3,"items":"[y, t, x]"},{"freq":3,"items":"[y, t, x, z]"},{"freq":3,"items":"[y, t, z]"},{"freq":2,"items":"[y, s]"},{"freq":2,"items":"[y, s, t]"},{"freq":2,"items":"[y, s, t, x]"},{"freq":2,"items":"[y, s, t, x, z]"},{"freq":2,"items":"[y, s, t, z]"},{"freq":2,"items":"[y, s, x]"},{"freq":2,"items":"[y, s, x, z]"},{"freq":2,"items":"[y, s, z]"},{"freq":3,"items":"[y, x]"},{"freq":3,"items":"[y, x, z]"},{"freq":3,"items":"[y, z]"},{"freq":2,"items":"[q]"},{"freq":2,"items":"[q, t]"},{"freq":2,"items":"[q, t, x]"},{"freq":2,"items":"[q, t, x, z]"},{"freq":2,"items":"[q, t, z]"},{"freq":2,"items":"[q, y]"},{"freq":2,"items":"[q, y, t]"},{"freq":2,"items":"[q, y, t, x]"},{"freq":2,"items":"[q, y, t, x, z]"},{"freq":2,"items":"[q, y, t, z]"},{"freq":2,"items":"[q, y, x]"},{"freq":2,"items":"[q, y, x, z]"},{"freq":2,"items":"[q, y, z]"},{"freq":2,"items":"[q, x]"},{"freq":2,"items":"[q, x, z]"},{"freq":2,"items":"[q, z]"},{"freq":4,"items":"[x]"},{"freq":3,"items":"[x, z]"},{"freq":3,"items":"[r]"},{"freq":2,"items":"[r, x]"},{"freq":2,"items":"[r, z]"}]
For FreqItemset that would work, but what if it had a method that did some calculations? I was thinking more of a general solution. Indeed we would have to make a request across for each item.
I guess for collect()/take() that might be enough though to just have toJSON(), but we would have to implement that everywhere
Do we really have a need for a remote invocation model? I don't see Spark as an interactive thing, seems like to is more of a batch thing.
We already have to do this for randomSplit that returns an array of RDDs. For collect/take though I think returning a JSON blob could be enough.
@billreed63 @bpburns Trying to port over mllib FP Growth and the issue is that there is a collect() call than returns an array of FreqItemset (which is a nested class). Since we are calling JSON.stringify currently all we get is an array of values like this:
What we need is a better serializer that lives in nashorn that would return us something like this:
And then on the Node.js side we would know to create a wrapper of the correct type.
Any thoughts before I go implement this?