gali8 / Tesseract-OCR-iOS

Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64.
http://www.nexor.it
MIT License
4.21k stars 948 forks source link

characterChoices returns empty elements #228

Open tedde opened 8 years ago

tedde commented 8 years ago

I am reading the mrz of id cards/passports - most of the time the OCR is perfect but sometimes I would like to iterate over the choices in order to fix errors. However for some images there are choices missing, as far as I've seen always one full row. Why? Am I doing it wrong? Or is it a bug?

in the example below the first row of the image does not return any choices at all, as seen in the beginning of the output, however being read as seen in the bottom of the output.

1 using the following code

    G8RecognitionOperation *operation = [[G8RecognitionOperation alloc] initWithLanguage:@"eng"];
    operation.tesseract.engineMode = G8OCREngineModeTesseractOnly;
    operation.tesseract.pageSegmentationMode = G8PageSegmentationModeAutoOnly;
    [operation.tesseract setVariableValue:@"ABCDEFGHIJKLMNOPQRSTUVWXYZ<0123456789" forKey:@"tessedit_char_whitelist"];
    [operation.tesseract setVariableValue:@"T" forKey:@"assume_fixed_pitch_char_segment"];
    operation.tesseract.image = &bwImage;
    operation.tesseract.recognize;
    NSArray *a = operation.tesseract.characterChoices;
    NSLog(@"%@", a); 
    NSLog(@"RECOGNISED= %@" , [operation.tesseract recognizedText]);

Output

(
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
    ),
        (
        "(81.30%) '8'"
    ),
        (
        "(82.51%) '0'",
        "(75.10%) 'B'",
        "(71.87%) 'O'",
        "(71.62%) 'Q'",
        "(71.30%) 'C'",
        "(68.84%) 'G'"
    ),
        (
        "(89.18%) '1'"
    ),
        (
        "(85.36%) '0'",
        "(77.56%) 'O'"
    ),
        (
        "(86.12%) '1'"
    ),
        (
        "(81.99%) '0'",
        "(74.86%) 'O'",
        "(70.67%) 'Q'",
        "(68.59%) 'B'",
        "(68.47%) 'C'"
    ),
        (
        "(85.11%) '0'",
        "(76.91%) 'O'",
        "(71.51%) 'Q'"
    ),
        (
        "(94.15%) 'M'"
    ),
        (
        "(88.53%) '1'"
    ),
        (
        "(85.22%) '7'"
    ),
        (
        "(80.44%) '0'",
        "(76.15%) 'O'",
        "(69.74%) 'Q'",
        "(69.29%) 'C'",
        "(67.53%) 'B'"
    ),
        (
        "(88.68%) '2'"
    ),
        (
        "(85.94%) '0'",
        "(75.14%) 'B'",
        "(71.71%) 'O'"
    ),
        (
        "(76.29%) '9'"
    ),
        (
        "(89.28%) '1'"
    ),
        (
        "(94.65%) 'E'"
    ),
        (
        "(86.10%) 'S'",
        "(77.95%) '5'"
    ),
        (
        "(92.35%) 'T'"
    ),
        (
        "(81.21%) '<'"
    ),
        (
        "(76.13%) '<'"
    ),
        (
        "(83.40%) '<'"
    ),
        (
        "(85.28%) '<'"
    ),
        (
        "(85.74%) '<'"
    ),
        (
        "(83.62%) '<'"
    ),
        (
        "(83.62%) '<'"
    ),
        (
        "(81.84%) '<'"
    ),
        (
        "(80.28%) '<'"
    ),
        (
        "(82.61%) '<'"
    ),
        (
        "(85.72%) '<'"
    ),
        (
        "(91.66%) '2'"
    ),
        (
        "(82.86%) 'S'",
        "(79.72%) '5'"
    ),
        (
        "(87.99%) 'P'"
    ),
        (
        "(90.25%) 'E'",
        "(75.38%) 'B'"
    ),
        (
        "(73.48%) 'C'",
        "(63.71%) 'E'"
    ),
        (
        "(85.36%) 'I'"
    ),
        (
        "(92.14%) 'M'"
    ),
        (
        "(92.45%) 'E'"
    ),
        (
        "(93.64%) 'N'",
        "(79.42%) 'M'"
    ),
        (
        "(73.11%) '<'"
    ),
        (
        "(72.99%) '<'"
    ),
        (
        "(90.35%) 'A'"
    ),
        (
        "(86.72%) 'N'"
    ),
        (
        "(92.94%) 'D'"
    ),
        (
        "(85.07%) 'R'"
    ),
        (
        "(94.44%) 'E'"
    ),
        (
        "(88.69%) 'W'"
    ),
        (
        "(83.70%) '<'"
    ),
        (
        "(80.63%) '<'"
    ),
        (
        "(75.83%) '<'"
    ),
        (
        "(81.21%) '<'"
    ),
        (
        "(84.20%) '<'"
    ),
        (
        "(84.55%) '<'"
    ),
        (
        "(83.27%) '<'"
    ),
        (
        "(83.06%) '<'"
    ),
        (
        "(81.36%) '<'"
    ),
        (
        "(81.34%) '<'"
    ),
        (
        "(78.78%) '<'"
    ),
        (
        "(80.69%) '<'"
    ),
        (
        "(85.49%) '<'"
    ),
        (
        "(82.61%) '<'"
    )
)

IELVAEA99907431101080<88884<<<
8010100M1702091EST<<<<<<<<<<<2
SPECIMEN<<ANDREW<<<<<<<<<<<<<<
tedde commented 8 years ago

@BamX Seems like you wrote the characterChoices function, can you shed any light on this?

uchitapal commented 7 years ago

how to recognized special character like (.,@) using tesseract please help me and reply me fast as soon as possible....

tedde commented 7 years ago

@uchitapal you question is not related to the original question. However, to be able to recognise any character (normal or special) tesseract must be trained for that character. Also the character must be whitelisted. Keep in mind that I'm not tesseract expert :)