EclairJS / eclairjs-nashorn

JavaScript API for Apache Spark
Apache License 2.0
94 stars 11 forks source link

Generated JS for scala traits #52

Open pberkland opened 8 years ago

Brian-Burns-Bose commented 8 years ago

May not be valid....

conker84 commented 8 years ago

Hi guys, on my repository a proposal for hypothetical javascript Trait implementation (this is a WIP version so some code parts are still ugly). I have added a new package org.eclairjs.nashorn.trait, that contains three files:

  1. Trait.java: the base class extended for all generated classes;
  2. TraitGenerator.java: the class that provides bean code generation via Javassist framework;
  3. TraiUtils.java: a utility class.

For the javascript part i have added a new folder trait that contains three module:

  1. JSTrait.js: used for create a new class;
  2. TraiClass.js: a js wrapper for user generated class;
  3. TraitInstance.js: a js wrapper for user generated object.

Following an example of buildPeopleTable rewritten with js traits (taken from my dataframetest.js file)

var buildPeopleTable = function (file, date) {
    var properties = null,
        className = null,
        useDateType = date || false;
    if (useDateType) {
        className = 'PersonComplexWithDate';
        properties = {
            name : Java.type('java.lang.String').class,
            age : 'java.lang.Integer',
            expense : 'java.lang.Integer',
            DOB : 'java.sql.Date',
            income : 'java.lang.Double',
            married : 'java.lang.Boolean',
            networth : 'java.lang.Double'
        };
    } else {
        className = 'PersonComplexWithTimestamp';
        properties = {
            name : Java.type('java.lang.String').class,
            age : 'java.lang.Integer',
            expense : 'java.lang.Integer',
            DOB : 'java.sql.Timestamp',
            income : 'java.lang.Double',
            married : 'java.lang.Boolean',
            networth : 'java.lang.Double'
        };
    }
    var PersonClass = JSTrait.createClass(className, properties);
    var people = sparkContext.textFile(file)
        .map(function (line, PersonClass, SqlDate, SqlTimestamp, useDateType) {
            var parts = line.split(',');
            return PersonClass.newInstance({
                name : parts[0],
                age : parseInt(parts[1].trim(), 10),
                expense: parseInt(parts[2].trim(), 10),
                DOB: useDateType ?
                        new SqlDate(parts[3].trim()) : new SqlTimestamp(parts[3].trim()),
                income: parseFloat(parts[4].trim(), 10),
                married: parts[5].trim(),
                networth: parseFloat(parts[6].trim(), 10)
            });
        },
        [PersonClass, SqlDate, SqlTimestamp, useDateType]);
    //Apply the schema to the RDD.
    var peopleDataFrame = sqlContext.createDataFrame(people, PersonClass);
    peopleDataFrame.registerTempTable("people");
    return peopleDataFrame;
};

There are some issue, the first and big one is that dataframe column order seems completly random (this make fail most of the dataframetest.js use cases), furthermore i have some concerns about performances and the working at scale. I need your feedback to check if this is the right way. Thanks

billreed63 commented 8 years ago

The purpose of eclairJS-nashorn is to "hide" Java/Scala from the JavaScript users as much as possible, this code

    if (useDateType) {
        className = 'PersonComplexWithDate';
        properties = {
            name : Java.type('java.lang.String').class,
            age : 'java.lang.Integer',
            expense : 'java.lang.Integer',
            DOB : 'java.sql.Date',
            income : 'java.lang.Double',
            married : 'java.lang.Boolean',
            networth : 'java.lang.Double'
        };
    } else {
        className = 'PersonComplexWithTimestamp';
        properties = {
            name : Java.type('java.lang.String').class,
            age : 'java.lang.Integer',
            expense : 'java.lang.Integer',
            DOB : 'java.sql.Timestamp',
            income : 'java.lang.Double',
            married : 'java.lang.Boolean',
            networth : 'java.lang.Double'
        };
    }

Makes it plan to the user they are use Java a better implementation would be:

    if (useDateType) {
        className = 'PersonComplexWithDate';
        properties = {
            name : Java.type('java.lang.String').class,
            age : 'java.lang.Integer',
            expense : 'java.lang.Integer',
            DOB : 'java.sql.Date',
            income : 'java.lang.Double',
            married : 'java.lang.Boolean',
            networth : 'java.lang.Double'
        };
    } else {
        className = 'PersonComplexWithTimestamp';
        properties = {
            name : string,
            age : 'integer',
            expense : 'integer',
            DOB : 'timestamp',
            income : 'float',
            married : 'boolean',
            networth : 'float'
        };
    }

The other thing that concerns me is implementing traits, this is not native to JavaScript, it is scala language why bring this into JavaScript? If the user wants to use traits, use scala.

conker84 commented 8 years ago

For the first question i think we can find a solution maybe using a EclairJS style, such as the SQLTimestamp.js implementation for instance.

For your concern i think that if a developer wants to use a big data framework and he feels more confortable with JavaScript, EclairJS is the answer, this is not about "i want to use traits" or "i don't want use traits"; this is about use JavaScript APIs that enable devs to overcome "The limits of my language mean the limits of my world" :)

This proposal is about the use js object instead of Row object, simplifying the programming model because the use of the last one force the dev to remember the exactly position of each field, instead imho is much simpler to remember the name. Thanks for the feedback!

billreed63 commented 8 years ago

If the user wants to use a JavaBean in JavaScript nashorn allows the user to creating Java Objects.

250 implemented a way for a JavaScript developer to use a JavaScript object instead of a scheme. The objective of EclairJS is to give the JavaScript developer JavaScript ways to use Spark, not implement pieces of another programming language.

It seems like the goal you are after is better implemented by something like #250

conker84 commented 8 years ago

I haven't seen the #250. As i said my proposal had nothing to do with Scala, except the name :) I think is better to close this issue, as you said the best thing is obtained with #250 Thanks for your time!