Gmousse / dataframe-js

No Maintenance Intended
https://gmousse.gitbooks.io/dataframe-js/
MIT License
460 stars 38 forks source link

Dataframe from file examples #50

Closed cjohns38 closed 6 years ago

cjohns38 commented 6 years ago

I'm familiar with data frames through R and python so I was looking for something similar in JS. I ran into this module. The problem I'm having is that when I try and read in a local JS file it's returning a promise not an actual DF. If I run the following a DF doesn't really seem to be created.

var DataFrame = dfjs.Dataframe; 

const df = new DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then(df => df); 

df.show()  // returns df.show is not a function 

If I run this the DF is displayed so I know it's not a read error but something about how the DF is being created.

var DataFrame = dfjs.Dataframe; 

// Actually shows the DF 
const df = new DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then( df => {df.show()} ); 

As best I can tell this has to do with a promise being created not the actual DF. If I use the following it functions as I would expect.

// From a collection (easier)
const df = new DataFrame([
    {c1: 1, c2: 6}, // <------- A row
    {c4: 1, c3: 2}
], ['c1', 'c2', 'c3', 'c4']);

df.show()

Request 1: When reading in the json file like above, how do I return the DF object? Can you update the basic usage documentation help illustrate how to handle reading the file in and returning a DF? Maybe I'm missing something simple?

Request/Question 2: When I use df.select('column') it returns an object when I would have expected it to return a list or vector of the values. Is there simple method for doing that? If so, can you provide an example in your documentation?

FYI, I'm looking to take JSONS, convert to DFs, and then manipulate and subset the data and then feed it into D3 to make some interactive dashboards. Between some basic JS on the page, DFs and what they can do, and D3 I think there are some great opporunities, I just need to get the DF portion ironed out.

Gmousse commented 6 years ago

Hi @cjohns38,

Indeed I think the documentation needs some examples :D.

About the Question 1: I think your difficulty is due to a misunderstanding about how javascript read files and request http(s) queries. Indeed, unlike R or python, javascript (or nodejs) is asynchronous. Then, when you write:

const df = DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then(df => df);

df is not a DataFrame but a Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise).

The Promise is an async object and doesn't contain you data. However you can access your DataFrame in a .then:

DataFrame.fromJSON('http://localhost:8000/data/maps3.json')
      .then(df => {
           /* All your js stuff goes here */
           df.show(); // It's async.
           const myCount = df.count();
           /* your d3 stuff here */
      }); 

It' s pretty similar to d3 (and for the history, I use d3-requests to call https GET urls):

// d3 with callbacks
d3.csv("/path/to/file.csv", (error, data) => {
  if (error) throw error;
  const df = new DataFrame(data);
  df.show();
});

// DataFrame
DataFrame.fromCSV("/path/to/file/csv")
     .then(df => {data.show()})
     .catch(error => console.error(error))

However, d3 doesn't use Promises (sadly) but callbacks. That's near the same thing.

I hope I m clear.

About the question 2:

Indeed, DataFrame.select doesn't return the column but a new DataFrame with a subset of columns.

If you need the values of a dataframe (in an array for example) you can use the different conversion (https://gmousse.gitbooks.io/dataframe-js/doc/BASIC_USAGE.html#export-or-convert-a-dataframe) methods DataFrame.toDict(), DataFrame.toCollection()...

If you need the values of unique column you can use:

yourdf.toArray(yourColumn) // Gives yourColumn values as an Array ['A', 'B', 'C'] 
yourdf.select(yourColumn).toCollection() // Gives yourColumn as a collection [{yourColumn: 'A'}, {yourColumn: 'B'}, {yourColumn: 'C'}]

I hope I have answered to your questions.

I use d3.js in my work (with dataframe-js). It works well together :D.

Feel free to ask other questions or close the issue if it's resolved.

cjohns38 commented 6 years ago

First, thanks for the quick reply and my apologies for asking basic JS questions. Trying to make the transition from R/Python to JS is tough! • If I understand the general concept of a promise correctly, then it is only run when something executable is requested. In a way, similar to spark's lazy evaluation. Does this also mean on each run the DF requires the entire JSON to be re-read? Is it not cached in memory? So if I ran DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then( df => {df.show()} ); then DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then( df => {df.show()} ); I just read the JSON twice? • For my understanding, it seems to me that creating a DF via a file vs. creating the DF via "hard coding it" functions differently. The DF via file has to always use the .then() and is going to reread the file. The df never persists. While the "hardcoded" option saves it as a more traditional DF similar to R or Python DF. This type of DF you could reuse. Example...

// DataFrame - read the file and then print it.  You can only reuse this once if you wanted to do something else you have to reread the file and edit the then statement.  
DataFrame.fromCSV("/path/to/file/csv")
     .then(df => {data.show()})

// Now I want to take one column of the DF which rereads the file? 
DataFrame.fromCSV("/path/to/file/csv")
     .then(df => {data.select('c1')})

vs.

// From a collection (easier) --- 
const df = new DataFrame([
    {c1: 1, c2: 6}, // <------- A row
    {c4: 1, c3: 2}
], ['c1', 'c2', 'c3', 'c4']);
df.show()

// I can also reuse this DF which I can't do with the file approach (i think?)
df.select('c1').toCollection()
df.dim();
df.show(); 

How about if I try and create a re-usable object that I could apply different methods on?

 x= DataFrame.fromJSON('http://localhost:8000/data/maps3.json')
x.then(df <= {df.show()})

x.then(df <= {df.select('STATE').toCollection()})

// All results in Syntax error: unexpected token. 

To be clear on the second example....

yourdf.select(yourColumn).toCollection()

My interpretation using the promise...

x = DataFrame.fromJSON('http://localhost:8000/data/maps3.json').then(df => {df.select('Name').toCollecition()});
//Returns  a promise pending... how do I get the colleciton out?

I like the idea of a DF, and the methods it's the promise that's tripping me up. I want to read the file once and then be able to manipulate it repeatedly during the life of the page. To me, it seems like if I hardcode the JSON or maybe pass it in, then I can do what I want. However, if I read a file in then the DF works a little differently (maybe?). Maybe I should read/parse the JSON and pass it to the DF so it persists?

Again, sorry for the basic questions.

Gmousse commented 6 years ago

Hi,

Indeed it seems you are a beginner in javascript.

To be clear, a Promise (like a callback on d3-request or simply ajax calls) is asynchronous and non-blocking. The whole javascript is async during I/O. Indeed, when you make a call to the filesystem (except for fs.readFileSync), a database or to an http(s) endpoint, the rest of the code will be triggered (in the .then) when the result comes.

When you create a DataFrame from js code (not an I/O), you are synchronous. When you ask to load file or to call an api endpoint, your are asynchronous.

For example if you want to do asynchronous stuff (not in DataFrame but just in js)

function yourCode(data) {
    console.log("my data is ready");
}

mypromise.then(yourCode);  // yourCall will be called when the data comes. It's non-blocking
console.log("my data is called"); // this console.log can be called before yourCode

To conclude with dataframe-js:

function myComputation(df) {
    const newdf = df.groupBy("id").count();
    df.show(); 
}

DataFrame.fromJSON(yourPath).then(myComputation) // myComputation is asynchronous and will be called when the data is loaded and the DataFrame created.

console.log(df) // It does not work because df can't be accessed outside the Promise, and also because at this moment, the promise is maybe not resolved.

It's not due to dataframe-js but javascript asynchronous philosophy.

I suggest you to read js books, to follow courses or simply have a look on js examples (https://github.com/Gmousse/dataframe-js/blob/develop/examples/titanic_analysis.js).

Some good sources to understand promises: http://exploringjs.com/es6/ch_promises.html https://www.discovermeteor.com/blog/understanding-sync-async-javascript-node/

Have fun and thanks to use dataframe-js.

jahnavi9299p commented 4 years ago

Hello, I am beginner to JS and yet am being ambitious to deal with datafraem-js. However I have experience in python datframes. I have followed the example on github which worked perfectly fine for me. However I have problem open my own dataframe ....

const DataFrame = require("dataframe-js").DataFrame; DataFrame.fromCSV( "Sample.csv" ) .then(df => { // Let's go to display quicly our table. // df.show(); console.log(df.listColumns()); } )

which returns [ 'Error: connect ECONNREFUSED 127.0.0.1:80' ] on PowerShell and GitBash. What went wrong?