Open iskunk opened 5 years ago
If I were to have all the image files that are used for testing in uncompressed format, they would take up about 15 times the current disk space, which I've been careful to keep down to about 10 MB. The approx. 200 jpg, png and tiff compressed image files would balloon to an estimated 150 MB.
As I understand, bmp format still does RLE, and the image files would be compressed within the distribution tarball (or .zip file) in any event. Would that improve the situation much?
Another possibility: add a set of Make rules that use some external tool (e.g. ImageMagick's convert
utility) to convert the image files into bmp/pnm, combined with a fallback in the code to load in such a format if the normal jpg/png/tiff support is unavailable.
It doesn't make much sense to me.
It's hard to imagine a user who wants to run the test suite, but doesn't have the standard image i/o libraries. I have deliberately limited as far as possible the dependencies of leptonica on external libraries, but virtually all users will have png, gz, jpeg and tiff. I consider webp, jp2k and gif optional -- webp and jp2k because they are not yet widely used, and gif because in every way it is inferior to png.
You can make leptonica without libraries, and the I/O gets appropriately stubbed out.
I implemented the original (most basic) bmp utility, that has no compression. It's there because people sometimes use it, like pnm, but has nothing else to justify putting effort in.
One could rely on gzip or bzip2 to compress the tarball, but when it is uncompressed, you have a big expansion of files in the directory. Not a nice thing to do the the users.
And then the entire test suite would need to be converted to use these uncompressed files.
-- Dan
The problem is not that those libraries are unavailable. (Although they could be, if you are building in a more unusual environment, like an embedded system.) The problem is that if you are preparing a build that deliberately does not use those libraries, then you don't have a way of validating that build. The image I/O is incidental to most of the tests, so in theory, having the images in a different format does not defeat their purpose.
(Of course, I could perform a separate build of the same source, with the external libraries enabled, and run the test suite using that. However, this doesn't actually validate the build that would be used, and could potentially let slip some unexpected error condition that results in part due to missing external libraries.)
How about the alternative option, then? I can provide a set of makefile rules that a user can invoke to automatically convert all the jpeg/png/tiff files into bmp/pnm. All that would be needed in the code, then, is that whenever an attempt to load e.g. foo.png
fails, it then falls back to attempting foo.bmp
(or pnm). This would leave the majority of users unaffected, while still allowing special-purpose builds to be run through the test suite.
It's too complicated. We support building using my hand-made makefiles, autoconf and cmake. How many libraries do you know that offer multiple ways to build?
You can build and run the tests using either the static makefiles or autoconf. cmake does not yet have the ability to run the tests after building (it's an old issue waiting implementation).
I can pretty well assure you that we're not going to complicate the github distro.
I'm not sure I understand where you see the complexity coming from. The use of multiple build systems would only suggest a different approach; it's not a showstopper. I will sketch out an idea of this below to illustrate.
(Note that I can currently build and run the tests, as shown above, but nearly all the tests fail as they cannot load the test images. As I mentioned previously, I could produce a separate build of the tests, with the external libraries, and run those---but then that build is not usable to me, due to various legal and release-engineering issues. At present, the situation is that a Leptonica build that does not have external libraries enabled is un-testable; there is no way to validate it.)
Here is one way how I see this working:
A convert.sh
shell script is added to the prog
subdirectory. This script, when run inside its containing directory, calls some external tool (like ImageMagick's convert
or some of the Netpbm utilities) to convert the various jpg, png and tiff files present into e.g. bmp format, under a different filename. (For example, foo.jpg
results in a foo.jpg.bmp
file.)
A new function is defined, something like pixReadWithFallback()
, with the same signature as pixRead()
. If it cannot load foo.jpg
for whatever reason, it then attempts foo.jpg.bmp
, and then as a final measure foo.jpg.pnm
.
This functionality could also be rolled into the existing pixRead()
, with some mechanism to enable it at run-time, but I think you would likely prefer a separate/new function.
An additional convention may be used to make multi-page TIFF files available in fallback form; e.g. .0.bmp
, .1.bmp
, and so on.
Tests are edited to use this new function instead of pixRead()
.
A user in my situation would need only know to run the convert.sh
script before invoking make check
, which should then complete without issue.
The majority of users who enable the external libraries as a matter of course are unlikely to even notice the added support for this scenario.
Pretty clever!
I don't want to put a fallback function into all the tests. However, as a compromise, suppose we make a script that converts those images files to bmp, but with the same name. Leptonica functions that read image data use the information in the header, not the filename extension, so a bmp file with a .png extension will be read properly as a bmp file.
Then we don't need to change the tests. You just run the script before the test.
I noticed that quality of pixRead()
. I don't like the idea of a file with extension X actually containing content Y, but if the tests are run in an out-of-source-tree build, then the files with the "wrong" extension can be created as temporary artifacts of the build. All that would be needed is the conversion script (which I can provide), and ensuring that the tests load files from the build tree in preference to the source tree.
I'll do some experimentation on this, and get back to you in a few days.
Okay, I've put the converted files into place, but have encountered an issue:
$ ./adaptmap_reg
////////////////////////////////////////////////
//////////////// adaptmap_reg ///////////////
////////////////////////////////////////////////
leptonica-1.79.0 : (null)
Time for gray adaptmap gen: 0.008
Error in pixWriteStreamPng: function not present
Error in pixWrite: pix not written to stream
Error in findFileFormatStream: truncated file
Error in fopenReadStream: file not found
[...]
Do the tests need to write out intermediate files using an external format?
Ah yes, the write side of the issue.
Do the tests need to write out intermediate files using an external format?
Yes, they are designed that way. The full set of tests generates about 340 MB of image files in generate mode, half in /tmp/lept/regout/ and half in /tmp/lept/golden/. Then in compare mode, 170 MB of image files are written to /tmp/lept/regout/.
For example, the function regTestWritePixAndCheck() is useful because of its simplicity (you don't need to specify the temp file names). It compares files byte-for-byte. The decision about format for the files is made for maximum compression. If uncompressed files were used, we'd generate 5 or 6 GB of data! There are 1000 invocations of this function alone in the regression tests.
If instead of writing to file, we wrote to memory, we would have no files available to look at when the test fails.
It's reasonable that more temporary space would be needed when uncompressed formats are the only ones available. Could we conditionalize the tests to not request png format when it is not compiled in?
You are persistent :-)
OK, I've done that. Need to handle png, jpeg and tiff_g4. Will push it out shortly.
Just trying to find a way to make this work :-)
Okay, I've pulled in your changes, and am seeing an increase in the number of tests passing. There are still a number of tests failing with "function not present":
$ grep -l 'function not present' *.log | fgrep -v test-suite.log | while read x; do echo $x:; grep 'function not present' $x | sort -u; echo; done
adaptmap_reg.log:
Error in pixWriteStreamJpeg: function not present
alphaops_reg.log:
Error in pixWriteStreamJpeg: function not present
Error in pixWriteStreamPng: function not present
baseline_reg.log:
Error in pixReadStreamPng: function not present
Error in pixWriteStreamPng: function not present
blend3_reg.log:
Error in pixWriteStreamJpeg: function not present
[...]
I'm getting better success with conversion to pnm, though in general there seem to be some issues with the way the image is encoded. One example is feyn.tif
not working correctly when converted to what appears to be a semantically equivalent bmp file. I have to investigate this further to better describe the issue, but it seems like there may be some interesting corner cases here.
These aren't corner cases -- there are 211 invocations of pixWrite() in the *_reg.c regression tests.
I can solve most of them by putting those same pre-processor statements in pixWrite().
done. Also included all variants of tiff compression supported by leptonica.
Thanks, I'm seeing fewer failures (94 pass, 36 fail). I looked through a few of the latter:
baseline_reg
fails due to passing GPLOT_PNG
to gplotSimple1()
blend2_reg
fails because it cannot find /tmp/lept/regout/blend2.14.png
boxa2_reg
fails due to boxaPlotSizes()
passing GPLOT_PNG
to gplotCreate()
bytea_reg
fails due to convertPdf()
-> pixGenerateCIData()
-> pixGenerateJpegData()
. Is it feasible to have an uncompressed PDF encoding, for completeness? Else these PDF-related tests could be skipped when jpeg/zlib/etc. is unavailable.
ccthin2_reg
fails due to l_bootnum_gen3()
calling zlibUncompress()
. Probably another one to skip when zlib is unavailable.
coloring_reg
fails due to "pixa" fonts not working. Another probable skip (unless prog/fonts/chars-*.tif
could be substituted).
enhance_reg
fails due to src/bmfdata.h
containing TIFF data. Another skip.
When you print e.g. "Warning in pixWrite: png library missing; output bmp format", you may want to add a newline, because in the test output logs the following line of output starts immediately after "format".
Lastly, after seeing the new TIFF logic, may I suggest something like
#define L_FORMAT_IS_TIFF(f) ((f) >= IFF_TIFF && (f) <= IFF_TIFF_ZIP)
? Seems like that same OR-statement crops up a few times...
Yes, I forgot the newline, which should always end the message string in the L_* macros.
And thanks for the L_FORMAT_IS_TIFF macro.
Can one make a macro of macros; e.g.,
A macro can certainly call other macros; what did you have in mind?
Looking over src/imageio.h
, specifically the IFF_*
enums, I was thinking... you already have a couple of symbolic entries in there (IFF_UNKNOWN
, IFF_DEFAULT
). What if there were one like IFF_BEST
, that would normally be equivalent to IFF_PNG
, but can become IFF_BMP
when necessary? Something like this could help abstract away a lot of the png references in the code.
This could potentially be taken further by combining it with a "symbolic" file extension like .best
, which when seen on a filename, is automatically changed to .png
or .bmp
as appropriate.
I don't think it's necessary. At this point, the two high level image writing functions, pixWrite() and pixWriteMem(), are instrumented to fall back to bmp.
Just noticed that I should put the format changer in pixWriteStream(), not in pixWrite(), because pixWrite() calls pixWriteStream(). Will do this.
The balance is now 101 pass, 29 fail.
Here are some suggested changes to the gplot code so that it can write pnm, and fall back to same when png is unavailable. (Additional tweaks are needed to get the tests passing; this is just a start.)
Some of the tests are going to need a fragment like
#if !defined(HAVE_LIBPNG)
L_ERROR("This test requires libpng to run.\n");
exit(77);
#endif
That will cause the test to return a result of SKIP in the "make check" run.
101 out of 130 seems GEFGW.
Thank you for the gplot patch. It doesn't save the tests, because we're writing to files named *.png, and gplot adds an appropriate extension, .pnm when outputting in pnm. But your fragment at least allows the failures to happen silently. Send me a list of tests that fail, and I'll add the fragment.
I was amused to see that the gplot pnm output has a lousy font for text, and is 200x bigger than the png output.
Skipped the regression tests that use gplot (they all emit png). You should now be at 101 pass, 17 fail.
Just realized that I can modify the gplot routines to return an image (png or pnm), simplifiying the procedure of generating it, and doing so without changing the gplotSimple1 (etc) interfaces.
Perhaps a weekend project.
It's certainly getting better! Here is an initial set of tests that can be skipped:
Test name Skip due to call to...
================ ==========================
boxa3_reg pixaGenerateFontFromString
boxa4_reg pixaGenerateFontFromString
ccthin2_reg l_bootnum_gen3
colorcontent_reg pixaGenerateFontFromString
coloring_reg pixaRead
colorize_reg pixaGenerateFontFromString
enhance_reg pixaGenerateFontFromString
genfonts_reg pixaGenerateFontFromString
insert_reg pixaWrite
mtiff_reg (multi-page TIFF functionality)
numa2_reg pixaGenerateFontFromString
ptra2_reg pixaWrite
quadtree_reg pixaGenerateFontFromString
rankhisto_reg pixaGenerateFontFromString
shear2_reg pixaRead
subpixel_reg pixaRead
writetext_reg pixaRead
Function Dependency
========================== ==========
l_bootnum_gen3 zlib
pixaGenerateFontFromString libtiff
pixaRead libpng
pixaWrite libpng
Separately: pngio_reg
just needs the exit(77)
bit.
This patch has a few further suggestions for the code:
gplot.c
: Slight logic tweak so that output is still written to a .png
file when a GPLOT_PNG
-> GPLOT_PNM
downgrade takes place. (This aside, having the gplot routines return an image object directly seems like it will be a better way to go. But for now, this should at least allow the gplot-using tests to be enabled.)pdfio2.c
: Implemented downgrade to L_FLATE_DECODE
when zlib is available. This isn't a complete solution (there is no further fallback without zlib, not unless you implement an uncompressed PDF encoding), but zlib is usually available even when the other libs are not.By the way, I ran into some trouble with the HAVE_LIB*
conditionals in pdfio2.c
when I noticed that the file was not #including config_auto.h
. I would suggest adding that #include to every .c file that doesn't already have it. Even if a source file doesn't use HAVE_*
et al., the config header could well define something like _XOPEN_SOURCE
or function replacements that affect normal (non-preprocessor-oriented) code.
With the above change to pixGenerateCIData()
, the pdfio1_reg
test can be skipped when zlib is unavailable.
There are a few remaining tests needing attention, but let's get this low-hanging fruit out of the way first.
Re gplot font: I'm not conversant in Gnuplot, but you might find that tweaking the terminal type gives you better results. I assume getting the same pixels as png output should be possible.
Coincindentally, another collaborator is independently fixing some build warnings, and he has been replacing
I believe you said that we should use #if defined(xxx) in general. Is that correct?
If we put
in every source file, will that solve the problem and allow us to use
throughout?
I'd rather not do your gplot change with output_actual.
Instead my plan is to do this: In gplot, we can generate the pix directly with a single call, without having to specify a rootname and then generating the file name to read it back. And this can be done changing only gplotSimpleXY1(), ...XY2() and ... XYN() and adding new functions that call them: gplotSimplePix1(), gplotSimplePix2(), gplotSimplePixN()
Once we have these functions, the use (e.g., in the regression tests) will simply be through regTestWritePixAndCheck() We will not need to know the names of the temporary files that gplot creates.
Working late today...
For HAVE_*
symbols defined by Autoconf, yes: #if defined(HAVE_FOO)
(or #ifdef HAVE_FOO
) should be used. Because HAVE_FOO
will take one of the following forms in the config header:
#define HAVE_FOO 1
...or...
/* #undef HAVE_FOO */
Using #if HAVE_FOO
in the latter case only works due to compiler leniency (it's a common flub), and with -Wundef
will print a warning.
(I suspect your collaborator is favoring #if
due to what they saw in environ.h
, interpreting that as the convention used by the project. The problem, of course, is that environ.h
itself needs to change to follow the Autoconf convention.)
Adding config_auto.h
to every source file isn't about resolving the #if/#ifdef
issue, but about making all the HAVE_*
symbols available in the first place. The problem I was having with pdfio2.c
was that all my added code, which is surrounded by #if defined(HAVE_LIBZ)
, was not getting compiled in even though HAVE_LIBZ
was correctly #defined to 1 in config_auto.h
. (If I had incorrectly done #if HAVE_LIBZ
, the result would not have been any different.)
Re gplot changes: Hiding the temporary file(s) sounds great. It's a bit more work, but ultimately a better approach.
Doesn't it make more sense to add those 3 lines to environ.h, which is included with allheaders.h in every source file?
I am the collaborater. @iskunk is right that #if defined(HAVE_FOO)
or #ifdef HAVE_FOO
should be used with the Autoconf macros, but that requires modifications in src/environ.h
.
It is important that the definitions from config_auto.h
are included in each source file which uses HAVE_FOO
. Adding the three lines to src/environ.h
and removing them everywhere else looks like a reasonable solution for me.
Ah, @stweil, didn't know it was you :-) With this issue, I'm hoping to lay the groundwork to be able to run Tesseract's test suite successfully with a Leptonica build that has image libraries disabled.
I would still recommend #including the config header individually in every .c source file.
Yes, you can save yourself some typing by putting it in a common project header, and many projects do this. However, one important consideration for the config header is that it must be #included before any system header, because in theory it could #define feature test macros (e.g. _XOPEN_SOURCE
) that are only effective if defined prior to any system #includes.
It is possible to do this in a common project header, but then it is not as obvious that the config header is coming first. And over time, this makes it more likely that someone accidentally places a system #include somewhere that breaks that requirement. It is easier to ensure that the #include is being done correctly if it is the first non-comment bit of code in the source file.
One other detail I neglected to mention previously: GNU recommends that the config header be pulled in with #include<>
rather than #include""
. Here is what they have to say:
With the appropriate -I option, you can use ‘#include
’. Actually, it's a good habit to use it, because in the rare case when the source directory contains another config.h, the build directory should be searched first.
Thank you for the explanation. I'll insert it as you recommend, in all source and program files.
On Sat, Nov 9, 2019 at 11:16 AM Daniel Richard G. notifications@github.com wrote:
Ah, @stweil https://github.com/stweil, didn't know it was you :-) With this issue, I'm hoping to lay the groundwork to be able to run Tesseract's test suite successfully with a Leptonica build that has image libraries disabled.
I would still recommend #including the config header individually in every .c source file.
Yes, you can save yourself some typing by putting it in a common project header, and many projects do this. However, one important consideration for the config header is that it must be #included before any system header, because in theory it could #define feature test macros (e.g. _XOPEN_SOURCE) that are only effective if defined prior to any system #includes.
It is possible to do this in a common project header, but then it is not as obvious that the config header is coming first. And over time, this makes it more likely that someone accidentally places a system #include somewhere that breaks that requirement. It is easier to ensure that the
include is being done correctly if it is the first non-comment bit of code
in the source file.
One other detail I neglected to mention previously: GNU recommends that the config header be pulled in with #include<> rather than #include"". Here https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Configuration-Headers.html is what they have to say:
With the appropriate -I option, you can use ‘#include
’. Actually, it's a good habit to use it, because in the rare case when the source directory contains another config.h, the build directory should be searched first. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/454?email_source=notifications&email_token=AD7KMLFLX5RR5IPTAN25X7TQS4EB7A5CNFSM4JGAFAU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUNCQY#issuecomment-552128835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7KMLAF7ZWP5OOUMB2CMC3QS4EB7ANCNFSM4JGAFAUQ .
I wonder whether the conditional around #include <config_auto.h>
is really needed. I don't think so, because both Automake and cmake builds provide that file, and builds without it would not work correctly.
My handbuilt makefile doesn't generate the file. environ.h handles the critical defines (for external libraries and special functions).
I see. Thank you.
Tested with the latest changes. Will it be possible to re-enable the tests that use gplot? E.g. baseline_reg
should now be able to pass (with minor changes).
blend2_reg
currently fails because regTestWritePixAndCheck()
writes blend2.14.bmp
but the test program looks for blend2.14.png
. Could that be addressed with something like format_actual
?
I fixed blend2_reg to work with fallback to bmp. It's ugly but c'est la vie. To fix baseline2_reg, I need a new function gplotMakeOutputPix() that returns the pix of the plot. I will try to do this in the next few days, and push these changes.
Beyond that I'm unlikely to get around to doing much more for a few weeks, at least, although I can add some skips for you if any tests are still failing.
On Tue, Nov 12, 2019 at 9:09 AM Daniel Richard G. notifications@github.com wrote:
Tested with the latest changes. Will it be possible to re-enable the tests that use gplot? E.g. baseline_reg should now be able to pass (with minor changes).
blend2_reg currently fails because regTestWritePixAndCheck() writes blend2.14.bmp but the test program looks for blend2.14.png. Could that be addressed with something like format_actual?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/454?email_source=notifications&email_token=AD7KMLCICXX372RDHLWX2FLQTLPLPA5CNFSM4JGAFAU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED27OYQ#issuecomment-552990562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7KMLH65AVLRQIFDZ4EWADQTLPLPANCNFSM4JGAFAUQ .
Did you push the changes to blend2_reg? (I'm not seeing any.) You didn't opt to have regTestWritePixAndCheck()
just write bmp to a .png file? This isn't the only test that has this problem, after all...
I'll gladly try out the gplot-related updates once you have them.
There's still some work left. For now, we've been addressing the relatively trivial matter of tests expecting certain filenames. But once those are straightened out, there are a few tests that fail due to differences brought about by the test images being in bmp or pnm format---differences that should probably not be affecting the results, given that the format conversion presumably yields a semantically equivalent output.
I recall some tests complained about colormaps, and I think another one errored out when a 1-bpp image got converted to a 0-or-255 8-bpp file (due to format limitations). These failures should raise more interesting questions that what we've looked at so far.
No, not pushed yet. Likely today sometime. Thank you for offering to test, and I'll let you know when I do.
On Wed, Nov 13, 2019 at 12:12 PM Daniel Richard G. notifications@github.com wrote:
Did you push the changes to blend2_reg? (I'm not seeing any.) You didn't opt to have regTestWritePixAndCheck() just write bmp to a .png file? This isn't the only test that has this problem, after all...
I'll gladly try out the gplot-related updates once you have them.
There's still some work left. For now, we've been addressing the relatively trivial matter of tests expecting certain filenames. But once those are straightened out, there are a few tests that fail due to differences brought about by the test images being in bmp or pnm format---differences that should probably not be affecting the results, given that the format conversion presumably yields a semantically equivalent output.
I recall some tests complained about colormaps, and I think another one errored out when a 1-bpp image got converted to a 0-or-255 8-bpp file (due to format limitations). These failures should raise more interesting questions that what we've looked at so far.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DanBloomberg/leptonica/issues/454?email_source=notifications&email_token=AD7KMLCJ4QFIUMFUBFGUEMTQTRNTRA5CNFSM4JGAFAU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7PXDY#issuecomment-553581455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7KMLH3OGQVRRDWLHBUKTDQTRNTRANCNFSM4JGAFAUQ .
Still seeing a failure in blend2_reg
, but it is a different, more interesting one now.
One thing I've found: pnm format can't do RGBA (unless you resort to a non-official variant), so I've switched over my converted files to bmp. And a lot more tests are breaking now, unfortunately.
Here's one interesting example: adaptnorm_reg
. The lighttext.jpg
file is a greyscale jpeg. If I convert it to a "truecolor" 24-bpp bmp file, then pixContrastNorm()
errors out because the image is not 8 bpp. If I convert it to a straight grayscale file, however, then the bmp file has a colormap, and pixContrastNorm()
errors out because of that.
So what should happen here? It doesn't seem like there can be an 8-bpp greyscale bmp file without a colormap. Perhaps there is a way to recognize when the colormap is actually just an artifact of a greyscale encoding?
Leptonica supports pam, which is an official extension of pnm. You can verify that reading and writing pnm works on an RGBA image:
pix1 = pixRead("books_logo.png");
pixDisplay(pix1, 100, 100);
pixWrite("/tmp/pix1.pnm", pix1, IFF_PNM);
pix2 = pixRead("/tmp/pix1.pnm");
pixDisplay(pix2, 450, 100);
Yes, bmp is a crummy format; it doesn't have 8 bpp gray without a cmap. I really dislike supporting these lousy legacy encodings. Another example is gif, which doesn't even support RGB.
Ah, I wasn't aware of pam. I've switched most of the files over to that now. (Should the library use IFF_PAM
rather than IFF_PNM
, then? That does appear to be a more versatile format.)
Some files (the four-color ones), I'm converting to bmp, as the tests seem to require a colormap.
However, I'm seeing this error crop up across many tests:
Error in pixcmapIsValid: invalid cmap n: 33 (nalloc = 16)
Does that look familiar? This instance is from loading weasel2.4c.png
, converted to an equivalent bmp file.
I've never seen that error.
When I do this:
pix1 = pixRead("weasel2.4c.png");
pixWrite("/tmp/junk.bmp", pix1, IFF_BMP);
I get this output:
Warning in pixWriteMemBmp: 2 bpp files can't be read; converting to 8 bpp
and 'fileinfo /tmp/junk.bmp' gives, as expected,
===============================================
Reading the header:
input image format type: bmp
w = 82, h = 73, bps = 8, spp = 1, iscmap = 1
===============================================
Reading the full image:
input image format type: bmp
w = 82, h = 73, d = 8, spp = 1, wpl = 21
xres = 0, yres = 0
colormap exists and has color values:
Pixcmap: depth = 8 bpp; 4 colors
Color R-val G-val B-val Alpha
----------------------------------------
0 48 44 40 255
1 232 240 232 255
2 168 172 168 255
3 184 144 96 255
===============================================
Could you give this a try, and see if you get the same result? ("convert" is the ImageMagick utility.)
$ convert weasel2.4c.png -depth 2 -format bmp3 -compress None weasel2.4c.bmp
$ ./fileinfo weasel2.4c.bmp
Error in pixcmapIsValid: invalid cmap n: 33 (nalloc = 16)
Error in pixReadStream: invalid colormap
Error in pixRead: pix not read
Error in pixReadHeader: bmp: pix not read
Error in writeImageFileInfo: failure to read header of weasel2.4c.bmp
I ran the converter, and I believe it makes an invalid file. As stated in bmpio.c, 2 bpp is not a valid bmp file. Just checked with wikipedia:
In the original OS/2 DIB, the only four legal values of color depth were 1, 4, 8, and 24 bits per pixel (bpp).[4] Contemporary DIB Headers allow pixel formats with 1, 2, 4, 8, 16, 24 and 32 bits per pixel (bpp).[17] GDI+ also permits 64 bits per pixel.[18]
So it's a moving target. But there is no reason for leptonica to support anything beyond the original spec.
Running display on the image, it is clearly different in color from the original png. Same with xzgv. And xli doesn't support the 2 bit bmp.
Running ' identify -verbose' gives a 4 color, 16 entry colormap with 2 bit samples, which is not possible -- 2 bit samples can only support a 4 color colormap. So, again, it appears that even imageMagick hasn't bothered to produce a valid implementation of 2 bpp bmp.
Leptonica does the right thing -- it converts to 8 bpp on writing a 2 bpp pix in bmp format. However, on reading I should abort 2 bpp bmp. Will fix this.
Yesterday I added include of auto_config.h to all 288 prog files (!!). I now have all tests passing again (when libraries are available). Before adding, about 25 were skipped, and one failed with autotools and 'make check'.
Hmm. I get the same error even if I drop the -depth 2
:
$ convert weasel2.4c.png -format bmp3 -compress None weasel2.4c.bmp
$ ./fileinfo weasel2.4c.bmp
Error in pixcmapIsValid: invalid cmap n: 33 (nalloc = 16)
Error in pixReadStream: invalid colormap
Error in pixRead: pix not read
Error in pixReadHeader: bmp: pix not read
Error in writeImageFileInfo: failure to read header of weasel2.4c.bmp
identify -verbose
shows it to have 8-bit channels, with Depth: 4/8-bit
. Shouldn't that be a valid bmp file?
Yes, it should read that bmp file. The read was failing because I'd hardcoded the size of the infoheader, so it got the wrong number for the number of colors in the colormap. Made a fix that uses the actual size of the infoheader.
Thanks Dan. Current test summary on my end:
=======================
17 of 106 tests failed
(27 tests were not run)
=======================
I reviewed the remaining failures. Here are the ones that either need some filename-extension tweaking, or could (potentially) be turned into SKIPs due to missing libraries:
bytea_reg
, pdfseg_reg
appear to hard-depend on jpg support (for PDF encoding)
compare_reg
, italic_reg
, pdfio1_reg
appear to hard-depend on zlib (for "flate" PDF encoding)
fpix1_reg
, kernel_reg
are looking for a png file in /tmp/lept/
when only a pnm file is present
pageseg_reg
appears to hard-depend on tiff (for PDF encoding)
pdfio2_reg
appears to hard-depend on jpg, tiff, and zlib (all for PDF encoding)
psio_reg
appears to hard-depend on jpg (for Postscript encoding)
psioseg_reg
appears to hard-depend on tiff (for Postscript encoding)
I am building Leptonica for use in a software system where image I/O and compression are handled by other components. Therefore, I configure with
When I build in this manner, however,
make check
gives rather discouraging results:The
adaptmap_reg.log
file begins withI think it would make more sense for the image files that are used in the library's test suite to be distributed in a format that the library can read without external dependencies, such as bmp or pnm. (Of course, this would not apply in instances where I/O involving formats that require external dependencies is the functionality being tested.)