BrainJS / brain.js

🤖 GPU accelerated Neural networks in JavaScript for Browsers and Node.js
https://brain.js.org
MIT License
14.27k stars 1.07k forks source link

Image recognition #352

Open dmtrKovalenko opened 5 years ago

dmtrKovalenko commented 5 years ago

image

Thanks a lot guys for creating such awesome tool for neural networks for javascript community. I have tried brain.js for creating NN for image recognition. Super-simple one - do the image contains samoyed or no.

And now I am a bit struggled. Cannot find anything about image recognition in the repo (only #176 but nothing was helpful from here)

I am looking for answers and would be good if you will create some example :) I am now doing something like this but it doesn't work anyway. 64x64 images processing deadly slow. For different image resolutions I am getting error: NaN.

DanielMazurkiewicz commented 5 years ago

Hi!

Does the brain.js good solution for image recognition?

You can do some basic image recognition with brain.js, however, if you plan to make some more advanced image recognition then you'll rather need convolution type of neural net layers (check this lib https://cs.stanford.edu/people/karpathy/convnetjs/ for details)

Why it takes so much time?

Images 64x64 will contain 4096 three color pixels, so at the start you'll get 12k variables to process by each neuron, reducing it to 32x32 gives 4 times less data and I would recommend you to reduce it even more, to 24x24 or 16x16. And as you start with ML I recommend you something as small as 8x8. Just see yourself what you can get from that and then based on results increase resolution.

In terms of speed I recommend you also to train bunch of small NNs giving simple answers instead of one big answering to all questions at the same time. For example: one can predict if there is a dog or cat or bicycle or something unknown on the picture, and another one will answer if it is a dog and if so if it is a samoyed or husky or other unknown type.

Last advice - use GPU backend of brain.js, that gives significant performance boost.

robertleeplummerjr commented 5 years ago

Does the brain.js good solution for image recognition?

TL;DR Currently, in v1, it is not ideal. It uses CPU.

TS;DR This is the single most active part of what is going on in Brian.js. In fact the not yet working example for mnist ( https://github.com/BrainJS/mnist-demo/blob/gh-pages/index.js#L1 ) is sitting, waiting, and ready. I've been focused on bringing Node GPU into the mix, via the dependency GPU.js here: https://github.com/gpujs/gpu.js/tree/gl-headless-experimental Which adds to compliment brain.js:

Already supported are:

YES, direct Video into a neural network!

The main features that are needed are convolution layers, which consist of convolution, pool, fullyConnected, & softMax. After they are unit tested fully, there will be a bit of API work to be done, and the net will be in a spot to release for v2. So ultimately things are very close, but I needed to take some time to push the GPU.js side of things to get it to the point it could be an enterprise platform for GPU programming with simple javascript that is built inside brain.js.

How to proceed?

You have a choice:

How to prepare image data?

Currently you'd need to convert the values to arrays.

In v2, you'd just feed the image directly into the neural network.

Why it takes so much time?

If you are talking about the existing implementation in brain.js, it is because the CPU is, in short... Very slow (one at a time) when you compare its ability to process a lot of data. There are ways to get it faster, but even on modern hardware, even if we ported the whole things to something that could run on multi CPU's, it'd be around 4ish times faster on a modern i7. If we converted it to WebAssembly, we could push that by maybe 1.5 times. But as soon as we go to the GPU, on a basic no frills GPU, we can easily get 5 to 10 times faster without using pipelines (textures that aren't transfered back to the CPU, they stay on GPU). With textures, we can get 12 to 50 times faster,and with a decent video card, that number rises FAST, (NOTE: I've seen it go as high as 300 times faster on an IMAC, though this isn't entirely accurate because in texture mode it processes somewhat async-ish, so we're still working out how well to test it for speed, but it is CRAZY fast, is the point).

If you are referring to the speed at which I've been building these components, however (which I don't think you are, but I'd like to take a moment to clarify) I didn't think it'd take this long either.

I've mentioned (possibly ranted) about this before , before, but right now, literally today or tomorrow GPU.js v2 will be released if I can find an hour or so.

It has been one of the hardest things I've ever worked on professionally to take the various complexities of all the required components and put them together, here is a small part:

Many other things happened, and many people have assisted along the way. The invitation is there for you to join as well. It could make all the difference.

robertleeplummerjr commented 5 years ago

Also, the other thing that is needed is convolutions, and the existing implementations that are released in v1, are just feedforward or rnn, lstm, or gpu. You could use the NeuralNetwork class with the CrossValidator, and that will likely give you better results as described here: https://golb.hplar.ch/2019/01/machine-learning-with-brain-and-tensorflow-js.html

dmtrKovalenko commented 5 years ago

Thanks a lot for awesome description. Feel free to close the issue. P.S. I think this issue will be very helpful for other users

josiahbryan commented 5 years ago

@robertleeplummerjr I'm back into the part o the year where I have a bit more time and I'm turning my attention back to my various AI pet-projects. I hear your call for additional development help with v2 - what do you need help with? Where might an interested party get involved? Can take this offline or in another "channel" if desired. Thanks!

robertleeplummerjr commented 5 years ago

Guys, I've added a table of percentages to show where we are with v2: https://github.com/BrainJS/brain.js/wiki/Roadmap#v2-status

This may help at least clarify where things are. I found a bug two days ago with GPU.js that I'm addressing that has caused a few more days delay for v2 of GPU.js release. I'll have it out soon though.

robertleeplummerjr commented 5 years ago

@josiahbryan PM'ed in Hangouts.

buirkan commented 4 years ago

Guys, I'm trying to implement a neural network wich the objective is recognize a simple image of a dog 🐶 and learn what breed he is.

Is the first time that I'm using that amazing tool to create a NN, and I'm kind of lost in many examples, anybody knows an example of something near of my objective, or any idea?

Thanks! :) PS: This issue was so good to my knowledge about the tool.

joneldiablo commented 2 years ago

we could put some Buffer as input?

vkarpov15 commented 11 months ago

Apologies in advance for the self promotion, but I wrote a bit about how to do this sort of basic image classification with brain.js and embeddings here: https://masteringjs.substack.com/p/building-a-hot-dognot-hot-dog-image.

redvivi commented 6 months ago

Can face recognition implemented the same way?

josiahbryan commented 6 months ago

Assuming there was some sort of embedding model for faces that you trust to distinguish between facial features in a reliable way, then yes. The beauty of text embeddings (specifically text, not as knowledgable about image embeddings) is that, given the vast corpus of textual information upon which the underlying embedding model was trained, the model "learns" semantic and contextual correlation between words within the input, so that "the cat is fluffy" and "the dog is sweet" presents two very similar embeddings, yet not identical.

Yet, a string like "the cat is fluffy" and "the pillow is fluffy" would present two very different embeddings, and even "the king is on the throne" and "you are a king" would be VERY different indeed. This is possible because the "features" are learned through vast exposure to large volumes of text, which teach the model that, when in the context of "throne", "king" means one thing, but "you" + "king" is a totally different (and idomatic) meaning.

Thus the "magic" of embeddings is presenting this contextual nuance in a normalized, standardized 1,536 (or whatever) float array which you can then work with.

Likewise for general image embeddings, the RGB is chopped into grids, flattened, tokenized (typically) into a limited vocab, fed into a transformer (typically) model with text labels in another vocab, and out comes a nice float array representing the image.

With regards to faces specifically, you would want an embedding model that was "Trained" to embed not just the RGB itself, but the unique features of the image and label them - then use THAT model to get the embedding layer just before the output logits (or whatever) and use THAT float array for classification. Assuming embeddings for faces is even a thing - I have no knowledge of such a model, I'm just riffing off of what I know of text and image embeddings in general to give some context on why just feeding a face thru a general image embedding might not give the highest quality results.

For face rec/detect, I might recommend looking at Haar Cascades + OpenCV instead. I vaguely remember actually doing something with OpenCV + brainjs back in 2019 for faces, but that was a long time ago and I've slept since then.