Why not you apply the technique to read letters for reading digits?

faceterteam / PayCards_Android

Credit card scanning for mobile apps

https://pay.cards/

Other

197 stars 126 forks source link

Why not you apply the technique to read letters for reading digits? #10

Closed dauxubk closed 6 years ago

dauxubk commented 6 years ago

Hi I can see that you fix the numbers of digit is 16 and group into 4 groups. However, the letter (card's holder name) cannot be fix like that, so why not you apply the technique to read letters for reading digits. The reason behind this is I want to read different kinds of card which are not always 16-4 groups. For example I want to read 19 digit-cards and divide into 3 groups

And one more question, I did not see the part you import the solver file and training phase. From the prototxt file, I successfully built the model which is exactly the same as yours. But I impossibly find the code where you train the model. Am I understanding something wrong? Regards, HienXinh

tchernitski commented 6 years ago

The initial task was to recognize only 16 digits cards (splitted into 4 blocks), so we decided to stick to this layout to get maximum recognition performance. Any attempts to make something more universal result in increasing error rate. I believe that now it's more efficient to use R-CNN or SSD etc networks to localaize digits. It will allow to use different layouts.

We didn't include training stage to the code source. It was very common procedure without any "bells and whistles". It's very simple classifier network and you can use "default" SGD solver to train it.

dauxubk commented 6 years ago

Thanks for your reply, I don't understand this formulas that you put in the following function, please tell me as specific as possible: bool CNumberRecognizer::PreLocalize(Mat& numberWindow, Mat& matrix, vector<cv::Point>& points) point.y = cvRound(data.at(0).second*23.0); points[k].x = cvRound(data.at(0).second*24) + 2; Why did you mutiply with 23 and 24 (is there any specific reason for this) And: void CCaffeNeuralNetwork::ProcessResult(const caffe::Blob<float>* output, shared_ptr<INeuralNetworkResultList>& resultList)

 for (int i=0; i<singleSampleNeuronsCount; i++) {
            float val= output->cpu_data()[count];
            if (val > maxValue)
            {
                maxValue = val;
                maxIndex = i;
            }

            rawResultPtr.push_back(pair<int, float>(i, val));
            count++;
        }

Especially the meaning of maxValue and output->cpu_data()[count] Thanks and regards, HienXinh

tchernitski commented 6 years ago

Hi,

Regarding hardcored 23/24 values, regression cnn returns prediction normalized between 0...1, so to get absolute value in pixels we should multiply prediction by the actual size (width/height) of the incoming sample.

Regarding max value: after prediction last softmax layer of cnn contains probabilities of each digit. So we pick up the most high probability and its index. If the highest probability found at index 3 then it appears that recognized digit is 3.

tchernitski commented 6 years ago

output->cpu_data() is the way to get access to caffe framework output data

dauxubk commented 6 years ago

Thanks Tchernitski very much for your supports.