cazala / synaptic

architecture-free neural network library for node.js and the browser
http://caza.la/synaptic
Other
6.91k stars 666 forks source link

[Community contribution] - Nearly any dataset to bits input normalizer #257

Open adadgio opened 6 years ago

adadgio commented 6 years ago

This is not an issue.

Since data normalization is probably the biggest challenge for beginners (such as myself), i'd like to submit a code to the community (to be integrated int the Normalization 101 wiki if appropriate).

The class (sorry, this is typescript !!) converts any training input data into its bits representation suitable for this great library.

Ths gist: https://gist.github.com/adadgio/ce54cba2d3f9b953924aa3be497259bb

Usage:

const originalData: Array<any> = [
  { soilhum: 500, airtemp: true, airhum: 18, water: true, name: "romain", cats: ["a", "b"] },
  { soilhum: 1050, airtemp: false, airhum: 21, water: true, name: "romain", cats: ["c", "a"] },
  { soilhum: 300, airtemp: true, airhum: 90, water: false, name: "edwards", cats: ["a", "b"] },
  { soilhum: 950, airtemp: true, airhum: 26, water: true, name: "jane", cats: ["c", "b"] },
  { soilhum: 1050, airtemp: false, airhum: 26, water: true, name: "romain", cats: ["a", "b"] },
  { soilhum: 1050, airtemp: false, airhum: 26, water: true, name: "romain", cats: ["b", "c"] },
];   
        let analyzer = new DataBitsAnalyzer(originalData);

        analyzer.setOutputProperties(['water']);
        analyzer.normalize();

        let nbrInputs = analyzer.getInputLength();
        console.log(`Nbr of inputs: ${nbrInputs}`);

        let inputs = analyzer.getBinaryInputDataset();
        let outputs = analyzer.getBinaryOutputDataset();
        console.log(inputs);
        console.log(outputs);

And the console logs:

Nbr of inputs: 9
Bits input dataset
[ [ 0.266667, 1, 0, 1, 0, 0, 1, 1, 0 ],
  [ 1, 0, 0.041667, 1, 0, 0, 1, 0, 1 ],
  [ 0, 1, 1, 0, 1, 0, 1, 1, 0 ],
  [ 0.866667, 1, 0.111111, 0, 0, 1, 0, 1, 1 ],
  [ 1, 0, 0.111111, 1, 0, 0, 1, 1, 0 ],
  [ 1, 0, 0.111111, 1, 0, 0, 0, 1, 1 ] ]
Bits output dataset
[ [ 1 ], [ 1 ], [ 0 ], [ 1 ], [ 1 ], [ 1 ] ]

Just hoping to contribute here.

cazala commented 6 years ago

Nice! feel free to add that example to the Normalization 101 article in the wiki (: is that piece yours? You should publish it as a package (:

adadgio commented 6 years ago

Yup, well i'v never tried publishing npm packages. I'm not really good at that ! Besides should i publish it as ts or plain nodejs?

cazala commented 6 years ago

You could have the typescript src in a github repo, then have an npm script that builds the js files (ie npm run build which uses tsc or webpack + ts-loader to build the dist files) and you just publish it using npm publish. If you want I can help you set it up, just put that piece in a repo and I'll send a PR (: I would use a package like that

adadgio commented 6 years ago

Sure ! Here is what i got up on git this morning.

https://github.com/adadgio/neural-data-normalizer

The readme probably needs a wording review, and the overall probably need tests...

cazala commented 6 years ago

dope (: that's pretty much all you need, you just needed to add a name/description for the package, I sent a PR to make them match the repo name/description: https://github.com/adadgio/neural-data-normalizer/pull/1

Now you just need to go to https://www.npmjs.com/ create an account, then go to you your local copy of that repo and run:

npm run build
npm login 
... follow login instruccions
npm publish

And that's it (: your package will be published to the npm registry as neural-data-normalizer v1.0.0 🎉