Closed 36PopTarts closed 5 years ago
Thanks so much for the detailed explanation!
Your code is not the same as the command and the command that you have is an ImageMagick 6 command instead of an ImageMagick 7 command. In ImageMagick 6 it would allow the command in any order but ImageMagick 7 is more strict about the order so it is not that easy to translate to code.
Your primary problem is that you will need to specify the Density
before you read the image:
var readSettings = new MagickReadSettings()
{
Density = RESOLUTION_DPI
};
using (MagickImageCollection img = new MagickImageCollection(inputPageFilePath, readSettings))
{
}
When you read the image like this it will be read at 300 DPI and result in an image with the resolution that you expect.
And if you want to compress the image with LZW
compression you will need to do this:
image.Compression = CompressionMethod.LZW;
p.s. Don't use the ImageOptimizer
here, it does nothing for TIFF images.
Thank you for clarifying! I was able to produce the exact same images (even the tiffinfo
profiles match) using your suggestion. Only one small correction though, it would appear that the Compression
property on the IMagickImage
interface is read-only, and I found a forum post which states that I should set it through the image.Settings.Compression
property instead. That worked and the image was LZW-compressed. I'm very glad that I don't have to use calls to shell command lines in the middle of a C# application anymore!
My bad, I keep forgetting that 😄. And happy to hear that you got it working.
Prerequisites
System Configuration
Question
Hello, and thanks for writing this library. I'm currently working on a .NET Core service application which processes PDF documents with the Tesseract OCR engine. To do this, it first uses ImageMagick to prep the PDF to a TIF with good quality for OCR processing. I recently replaced another developer, and that developer seemed to know about ImageMagick but not this library, so he was calling ImageMagick through the shell in C# code and passing command line arguments as you normally do.
The command line worked well enough, but a bug was recently brought to my attention where documents would sometimes be rendered with mostly pure-black color after processing -- this was due to the alpha layer being removed on documents which, as best as I can guess, had a pure black layer with an alpha layer on top which "punched holes" around the text, like popping a cardboard cutout from a sheet. That bug is probably not within the scope of this issue; I determined that it seemed to work more consistently across documents which are uploaded to our system if I don't use
-alpha Off
on images which have more than 2 samples per pixel. But it is ultimately why I made the change to this library.Anyway, I figured the best way for me to implement that is to make the switch to the Magick.NET library so that I can read the image attributes and decide easily within the application without double-processing an image. The only problem is, now the images do not come out with nearly the same resolution that they did before, even when I run the command line I posted above manually and compare the results to the service. When I run the command line, the resulting .TIF image always has a resolution of 2500x3300, which is plenty high enough quality for OCRing. When I run my code, the .TIF image comes out at 612x792, which is the native dimensions of the MediaBox container property in the PDF document, and the standard size for an image to be printed at letter size with a density of 72 DPI. That's not high enough because we're shooting for 300 DPI.
Here's what that command line looked like:
convert -depth 16 -density 300x300 -compress lzw -background white -colorspace RGB -despeckle -flatten -alpha Off "input.pdf" "output.tif"
And my code in .NET for converting and saving the image:I commented out the compression call because I thought that was what was causing the loss in quality at first, but then I realized that the resolution was actually much lower by using
tiffinfo
. Right now the only thing that is different between the CLI and my code as far as I can tell, is that I'm not using compression right now, until I can figure out why the resolution is still native.Tiffinfo
also says the density is the same on both versions of the image:I'm still relatively new to image manipulation in general, although I have worked with PDFs a lot in the past (not for image manipulation however). Is there a resample or resize step I am missing here? I tried resampling to 300 DPI as well through
image.Resample(new PointD(300, 300))
, but the image came out looking just as terrible. I would post the image files themselves, but unfortunately, our firm does work for the education sector and I simply cannot post these documents. It doesn't help that there are only a limited set of documents which produce the issue I'm trying to fix. If this is not enough information, I can try to find a good sample image that does not have any personal information on it, but that will take time. Any help you could offer to get my .NET code outputting the same resolution images as the CLI tool would be appreciated.