NeuraLegion / shainet

SHAInet - a pure Crystal machine learning library
MIT License
181 stars 19 forks source link

OCR recognise image on image #91

Closed fab1an2 closed 1 year ago

fab1an2 commented 4 years ago

How creating captcha breaker? For example : aaaaa

  1. How creating more output than one. I need 8 output.
  2. Meybe using found data on data, similar yolo
ArtLinkov commented 4 years ago

@fab1an2 I'll be happy to help, can you please elaborate on what you mean?

  1. Creating an output of 8, you mean a single layer of 8 neurons or 8 separate layers?
  2. What do you mean by found data?
hugoabonizio commented 4 years ago

@fab1an2 you'll probably need to define an archtecture for your network (or use one previously proposed) and define/train the model using shainet.

A common approach is to separate the characters on the image as a preprocessing and then predict one by one, which is easy to train since you'll need an anotated dataset.

I'm not very familiar with captcha breakers/OCRs, but maybe you could create a CNN -> LSTM like is done for some image captioning models.

fab1an2 commented 4 years ago

separating are impossible, many chars are on other chars. aaa2wwww I need learn whole chars.

  1. output must be more than 8 neurons. Look one char are from alphabet 25 different signs. Layers are not importantant. I ask only on output. Byt output must be more complikated.
  2. found data = found image on image recognise image on bigest image
ArtLinkov commented 4 years ago

I see, well in that case there are a few things to consider:

For image recognition, it is best to use a CNN to identify the chars, as @hugoabonizio mentioned. To chose the output layer size, simply define the last fully-connected layer (the one before the soft-max) to the size you want. Example: cnn.add_fconnect(l_size: 8, activation_function: SHAInet.sigmoid)

Now, in this specific case, your output needs to be 25 neurons, for each possible char. That is because when training you must give the NN an error to update its internal parameters for each guess it does, and 25 errors per single char guess makes it much faster to train. However, this is only for a single char recognition, so you still need to deal with the fact that there are 8 chars per image. You might employ different tactics to deal with this problem, every solution has its pro & cons but you can take the main ideas and combine them into something else, here are a few examples:

I hope this gives you some ideas :)