charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.29k stars 743 forks source link

PixToBitmapConverter inverts Format1bppIndexed images #289

Open tjkolev opened 8 years ago

tjkolev commented 8 years ago

Greetings,

This issue is similar to #72.

This is my input image: michelangelo s_david_-_floyd-steinberg

This is the output I get: 99210a3e-dfe7-4529-8371-787017180cc3

(Note that attached image is .png, because .bmp is not an acceptable format here.)

This is my simplistic fix:

// inner loop in TransferData1(PixData pixData, BitmapData imgData)
for (int x = 0; x < width; x++)
{
    byte pixVal = (byte)PixData.GetDataByte(pixLine, x);
    // simplistic fix - invert the bits
    if (imgData.PixelFormat == PixelFormat.Format1bppIndexed)
    {
        pixVal = (byte) (pixVal ^ 0xff);
    }
    imgLine[x] = pixVal;
}

I did not investigate any further what the root cause is, and this fix is certainly sub-optimal. It may adversely affect images that convert fine with the current code.

Thanks. tjk:)

nguyenq commented 7 years ago

If invert is really needed, Leptonica's pixInvert native function can be used.