CTBConsulting / Card.io-for-Capital-One-vertical-Cards

Code to Enhance Card.io so it can recognize Vertical Credit Card Formats from Capital One
Other
27 stars 11 forks source link

Tesseract training data #24

Open byjoh opened 5 years ago

byjoh commented 5 years ago

The last item in your README is "To be added: -Libraries Compile Instructions -Tesseract training instructions". Do you happen to have the training instructions handy, able to give a quick explanation or point to where I can find this information elsewhere? I added a new card definition to ocre.cpp for Chase Sapphire cards, but I assume it still can't recognize it because i need to add update the training set for these glyphs?

CTBConsulting commented 5 years ago

John Thanks for your interest in the source. Tesseract training is complex and available from several internet sources. Note that we greatly trimmed down the full tesseract source to keep our resulting code small and fast.

I should add that some of the new printed cards have their card numbers printed over top of the card logo. This reduces the contrast difference between the letters and logo colors. The consequence can be that the ability for card.io and the camera to recognize letters is reduced.

If you send me a sample picture of the card type your working with, I’d be happy to give you a quick assessment of your chances of recognition success. Please redact parts of the card number.

Coley

Sent from my iPad

On Nov 28, 2018, at 20:41, John Busby notifications@github.com wrote:

The last item in your README is "To be added: -Libraries Compile Instructions -Tesseract training instructions". Do you happen to have the training instructions handy, able to give a quick explanation or point to where I can find this information elsewhere? I added a new card definition to ocre.cpp for Chase Sapphire cards, but I assume it still can't recognize it because i need to add update the training set for these glyphs?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

byjoh commented 5 years ago

Hi Coley - the intent is actually to expand the training data set and defined cards to include a number of new / updated formats, not just the sapphire cards. I wasn't sure if you had developed a quick way of regenerating the set specific to our purposes here or if it was just the general way of getting training data out of tesseract. I was able to get it kind of working with the new definition i added to ocre.cpp (code needed some updates here and there - e.g. tessdata is copied to resourcePath not bundlePath during compilation, etc) but the accuracy of the numbers is a bit off given the training data contains no examples of this card. Also, as you mentioned, the contrast between card background in numbers is pretty low to begin with.

CTBConsulting commented 5 years ago

John: Thanks for the update. As I’m sure your are aware training Tesseract is a support level we can’t reasonably support. That said, it seems you are making good progress. The original source was produced during a project for a major card provider to facilitate card.io http://card.io/ recognition of their cards. We have subsequently adapted it for two other financial institutions as a part of a paid projects. If you get struck we can discuss opening a formal project to help you.

In the meantime please find the attached process document defining how to train tesseract . It may be something you already understand. Your focus should be on the digits “3”, “ 8”, “9” and “6” if these get working most of the remaining digits will be recognized.

Regards

Coley Brown President: CTB Consulting, inc Co-Founder: VisionMine.com Office: 410-275-9108 Linkedin: http://www.linkedin.com/in/coleybrown

On Nov 30, 2018, at 12:03, John Busby notifications@github.com wrote:

Hi Coley - the intent is actually to expand the training data set and defined cards to include a number of new / updated formats, not just the sapphire cards. I wasn't sure if you had developed a quick way of regenerating the set specific to our purposes here or if it was just the general way of getting training data out of tesseract. I was able to get it kind of working with the new definition i added to ocre.cpp (code needed some updates here and there - e.g. tessdata is copied to resourcePath not bundlePath during compilation, etc) but the accuracy of the numbers is a bit off given the training data contains no examples of this card. Also, as you mentioned, the contrast between card background in numbers is pretty low to begin with.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CTBConsulting/Card.io-for-Capital-One-vertical-Cards/issues/24#issuecomment-443270012, or mute the thread https://github.com/notifications/unsubscribe-auth/AQwvcbfqaJNq71YhLWPHRtx7DKwo8pmfks5u0WTUgaJpZM4Y4xCM.