Closed dauxubk closed 6 years ago
The initial task was to recognize only 16 digits cards (splitted into 4 blocks), so we decided to stick to this layout to get maximum recognition performance. Any attempts to make something more universal result in increasing error rate. I believe that now it's more efficient to use R-CNN or SSD etc networks to localaize digits. It will allow to use different layouts.
We didn't include training stage to the code source. It was very common procedure without any "bells and whistles". It's very simple classifier network and you can use "default" SGD solver to train it.
Thanks for your reply,
I don't understand this formulas that you put in the following function, please tell me as specific as possible:
bool CNumberRecognizer::PreLocalize(Mat& numberWindow, Mat& matrix, vector<cv::Point>& points)
point.y = cvRound(data.at(0).second*23.0);
points[k].x = cvRound(data.at(0).second*24) + 2;
Why did you mutiply with 23 and 24 (is there any specific reason for this)
And:
void CCaffeNeuralNetwork::ProcessResult(const caffe::Blob<float>* output, shared_ptr<INeuralNetworkResultList>& resultList)
for (int i=0; i<singleSampleNeuronsCount; i++) {
float val= output->cpu_data()[count];
if (val > maxValue)
{
maxValue = val;
maxIndex = i;
}
rawResultPtr.push_back(pair<int, float>(i, val));
count++;
}
Especially the meaning of maxValue and output->cpu_data()[count] Thanks and regards, HienXinh
Hi,
Regarding hardcored 23/24 values, regression cnn returns prediction normalized between 0...1, so to get absolute value in pixels we should multiply prediction by the actual size (width/height) of the incoming sample.
Regarding max value: after prediction last softmax layer of cnn contains probabilities of each digit. So we pick up the most high probability and its index. If the highest probability found at index 3 then it appears that recognized digit is 3.
output->cpu_data() is the way to get access to caffe framework output data
Thanks Tchernitski very much for your supports.
Hi I can see that you fix the numbers of digit is 16 and group into 4 groups. However, the letter (card's holder name) cannot be fix like that, so why not you apply the technique to read letters for reading digits. The reason behind this is I want to read different kinds of card which are not always 16-4 groups. For example I want to read 19 digit-cards and divide into 3 groups
And one more question, I did not see the part you import the solver file and training phase. From the prototxt file, I successfully built the model which is exactly the same as yours. But I impossibly find the code where you train the model. Am I understanding something wrong? Regards, HienXinh