data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
http://www.data-forge-js.com/
MIT License
1.32k stars 77 forks source link

Pandas value_counts() equivalent #48

Closed GermaVinsmoke closed 4 years ago

GermaVinsmoke commented 4 years ago

Can you tell me what's the equivalent of Pandas value_counts() in data-forge?

ashleydavis commented 4 years ago

I'm not sure of the top of my head.

Can you give me an explanation (or point me to an explanation online) of what that function does?

GermaVinsmoke commented 4 years ago

Suppose my Series is having values like this -

Fruits (Column)
Apple
Orange
Mango
Apple
Mango
Orange
Grape
Apple
Apple
Orange

Then the output of using value_counts() on that series will be -

Apple     4
Orange    3
Mango     2
Grape     1

Pandas - value_counts() image

ashleydavis commented 4 years ago

There is no function exactly like that in Data-Forge, but you can achieve the same thing using the pivot function like this:

const dataForge = require("data-forge");
const df = new dataForge.DataFrame([ ... data ...]);
const pivotted = df.pivot("Fruits", "Fruits", Series.sum);
console.log(pivotted.toString());

I haven't tested this code example specifically. So please let me know if it helps.

If there is an issue with it, please give some example data (as JSON data or a JavaScript code snippet) so I can test it.

Hope this helps!

You can read more about pivot in the guide: https://github.com/data-forge/data-forge-ts/blob/master/docs/guide.md#pivot

GermaVinsmoke commented 4 years ago

In case of Series.sum it is saying Series is not defined. I tried series => series.sum() but it was giving some weird result. I am trying to count the number of oppositions present in this CSV file. I've used the series iteration method which is working but I was thinking if there's any method which can be used instead of iterating the Series.

let query_param = req.params.id
    let series = df.getSeries(query_param)
    let match_count = {}
    for (const i of series) {
        if (match_count.hasOwnProperty(i)) {
            match_count[i]++;
        } else match_count[i] = 1
    }

image

CSV file link

ashleydavis commented 4 years ago

Sorry that should have been dataForge.Series.sum if you don't already have Series imported into your script.

Can you give me your complete code example data on GitHub? That might make it easier for me to look at.

GermaVinsmoke commented 4 years ago

I'll share the complete code after making a basic structure of the project.

GermaVinsmoke commented 4 years ago

I've sent you the invite for the private repository - (sachin-career-analysis)

ashleydavis commented 4 years ago

Thanks. Having a look at it.

Your instructions for running it aren't quite complete, I had do this this:

npm install --save-dev react-scripts

Before I could run npm run dev.

ashleydavis commented 4 years ago

I have your web app up and running.

Can you please provide instructions on how to reproduce the problem or invoke the problematic code.

Also I'm seeing the following error when running your web app:

image

You might want to fix that issue first just in case it's obscuring the real problem.

ashleydavis commented 4 years ago

I also see the following error in the console after starting your web app:

 ./src/config.js
[1] Module not found: Can't resolve 'firebase/app' in 'C:\temp\sachin-career-analysis\client\src'
[1] Compiling...
[1] Failed to compile.
GermaVinsmoke commented 4 years ago

First of all, thanks a lot sir for finding out the mistakes, I refactored a lot of things today because of that. I removed some modules, shifted Firebase database from client side to server side (because of this i was able to make the repository public). This also reduced the bundle size from 990 Kb to 550 Kb. I also moved the images from client side to express static files hosting. Basically, did a lot of things today.

Maybe now it'll work perfectly without any error.

And, can you tell me how to check a node or any other project having package.json or yarn file in a new isolated environment in which we aren't having any global modules installed beforehand (just like a Sandbox). As you installed in yours and got this firebase error but what I did was I cloned the repository in a different drive and then installed using npm. It worked without any errors for me, so I was thinking how can I check in an isolated environment as my system is having firebase globally installed in it.

Deployed - https://sachin-career.herokuapp.com/

ashleydavis commented 4 years ago

How about you prepare me a new example without firebase?

GermaVinsmoke commented 4 years ago

I made a branch with the name of "ashley" and in that removed the firebase from pakcage.json, commented out the firebase code. Can you check it now?

ashleydavis commented 4 years ago

I tried again.

Pulled latest changes to my copy of the repo. I switched to the ashley branch. Did the 'npm install' Did the 'npm start'.

Then I loaded up localhost;5000 in my browser.

I see a blank page. Opening dev tools I see the following errors;

image

I'm really not sure how I can help.

If you were able to submit me a Node.js code example with just a single index.js file and a csv data file it would probably be much easier for you to demonstrate the code that you want me to look at.

GermaVinsmoke commented 4 years ago

I cloned the repository in another system and did npm start after installing the dependencies. But you have to made the build in client folder (Inside client folder run npm run build) in order to run npm start.

But I told you to run npm run dev, in that case there's no need to create a build.

So if you want to do npm start then - npm install in both folders npm run build in client folder npm start in root folder

But if you want to do npm run dev then - npm install in both folders npm run dev in root folder

ashleydavis commented 4 years ago

Sorry I forgot about 'npm run dev', I'm so used to doing 'npm start'.

I did that. Loaded the page. I just see a blank web page.

I see this error in Chrome dev tools:

image

I see these errors on the console:

> npm run dev

> sachin-career-analysis@1.0.0 dev C:\temp\sachin-career-analysis
> concurrently "npm run start" "npm run client"

[1]
[1] > sachin-career-analysis@1.0.0 client C:\temp\sachin-career-analysis
[1] > npm start --prefix client
[1]
[0]
[0] > sachin-career-analysis@1.0.0 start C:\temp\sachin-career-analysis
[0] > node app.js
[0]
[0] server listening on port: 5000
[1]
[1] > client@0.1.0 start C:\temp\sachin-career-analysis\client
[1] > react-scripts start
[1]
[1] Starting the development server...
[1]
[1] Warning: React version was set to "detect" in eslint-plugin-react settings, but the "react" package is not installed. Assuming latest React version for linting.
[1] Failed to compile.
[1]
[1] ./src/Error.jsx
[1] Module not found: Can't resolve 'react' in 'C:\temp\sachin-career-analysis\client\src'
[1] Compiling...
[1] Failed to compile.
[1]
[1] ./src/Error.jsx
[1] Module not found: Can't resolve 'react' in 'C:\temp\sachin-career-analysis\client\src'

We would resolve this much quicker if you submit a runnable example project that is just a small Node.js project.

GermaVinsmoke commented 4 years ago

I think you forget to do npm install in client folder that's why it is saying --> Warning: React version was set to "detect" in eslint-plugin-react settings, but the "react" package is not installed. Assuming latest React version for linting. Module not found: Can't resolve 'react'

Not able to found the react module in client folder. Module not found: Can't resolve 'react'

I've made a basic node app for that particular thing - node app, the route is http://localhost:5000/match_data/opposition

ashleydavis commented 4 years ago

Ok... back on this.

I did npm run dev according to your instructions.

I have a functioning web site now, this looks pretty cool by the way:

image

What now? I do see anything that looks like a problem.

GermaVinsmoke commented 4 years ago

Thanks for that, it was an internship challenge which wasn't accepted 🙄, made it 2-3 weeks ago I guess.

Anyways coming onto the main point, if you're gonna enter this http://localhost:5000/api/pieData/opposition then you'll find some data, in that there's one object with the name of match_count. So, in order to calculate the match_count I used the following code snippet -

let df = req.app.locals.data
    let query_param = req.params.id
    let series = df.getSeries(query_param)
    let match_count = {}
    for (const i of series) {
        if (match_count.hasOwnProperty(i)) {
            match_count[i]++;
        } else match_count[i] = 1
    }

This code is available in api/pieData.js file.

So, I wanted to know if there was anything which can easily give the count values of each item present in a column, just like there is value_counts() in pandas.

Vbubblery commented 4 years ago

Any solution here?

ashleydavis commented 4 years ago

Closing this. Example code was provided, but it didn't make the problem clear.

Please feel free to add the value_counts function yourself (this is open source!). Just make sure you add automated tests and documentation for it. You are welcome to reach for advise and help.