DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.72k stars 384 forks source link

getImpliedFileFormat / pixWriteImpliedFormat and PNM issues #747

Closed panterlo closed 1 month ago

panterlo commented 1 month ago

If I use a PBM image exported with pdfimages (from a pdf) with a name of img-0000.pbm and the following details from exiftools img-0000.pbm:

MIME Type                       : image/x-portable-bitmap
Image Width                     : 3512
Image Height                    : 2480
Image Size                      : 3512x2480
Megapixels                      : 8.7
File Size                       : 1089 kB

The PDF contained exiftool -extractEmbedded -all:all

Embedded Image Width            : 3512
Embedded Image Height           : 2480
Embedded Image Color Space      : DeviceGray
Embedded Image Filter           : CCITTFaxDecode
File Type                       : (unsupported)

getImpliedFileFormat("img-0000.pbm") returns 0 e.g UNKNOWN IFF. findFileFormat("img-0000.pbm", out pformat) return 11 e.g IFF_PNM. pixWriteAutoFormat("test.pbm", pix) will return a IFF_TIFF_G4

Is this intended or what's going on ? I am trying to run a few Leptonica functions and save the image to the same format as the input. I understand it can be avoided by specifying IFF.IFF_PNM and using pixWrite which will save an identical copy of the file (if nothing has been carried out with the file).

DanBloomberg commented 1 month ago

Thank you for pointing this out. It was an omission from the extension_map array in writefile.c. It is fixed by adding the mapping from ".pbm" to IFF_PNM (the format integer for all the P*M images). I'll also add a mapping for ".pgm".

I will make the change today.

panterlo commented 1 month ago

Thanks Dan !