PRImA-Research-Lab / prima-page-to-pdf

Java command line tool to convert PAGE XML files with layout and text content to PDF
Apache License 2.0
10 stars 2 forks source link

TIFF support #5

Open bertsky opened 3 years ago

bertsky commented 3 years ago

I have a TrueColor RGB TIFF (i.e. 3x16 bit depth), for which the latest PageToPdf.jar prebuilt release gives me:

java.lang.IllegalArgumentException: Bits per sample 16 is not supported.
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImageColor(TiffImage.java:376)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:117)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:315)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:303)
    at com.itextpdf.text.Image.getInstance(Image.java:308)
    at com.itextpdf.text.Image.getInstance(Image.java:242)
    at com.itextpdf.text.Image.getInstance(Image.java:365)
    at org.primaresearch.pdf.PageToPdfConverter.addImage(PageToPdfConverter.java:446)
    at org.primaresearch.pdf.PageToPdfConverter.addPage(PageToPdfConverter.java:192)
    at org.primaresearch.pdf.PageToPdfConverter.convert(PageToPdfConverter.java:119)
    at org.primaresearch.pdf.CommandLineTool.main(CommandLineTool.java:162)
com.itextpdf.text.exceptions.IllegalPdfSyntaxException: Unbalanced save/restore state operators.
    at com.itextpdf.text.pdf.PdfContentByte.sanityCheck(PdfContentByte.java:3699)
    at com.itextpdf.text.pdf.PdfContentByte.reset(PdfContentByte.java:1535)
    at com.itextpdf.text.pdf.PdfContentByte.reset(PdfContentByte.java:1523)
    at com.itextpdf.text.pdf.PdfWriter.resetContent(PdfWriter.java:765)
    at com.itextpdf.text.pdf.PdfDocument.initPage(PdfDocument.java:1151)
    at com.itextpdf.text.pdf.PdfDocument.newPage(PdfDocument.java:1010)
    at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:865)
    at com.itextpdf.text.Document.close(Document.java:416)
    at org.primaresearch.pdf.PageToPdfConverter.convert(PageToPdfConverter.java:123)
    at org.primaresearch.pdf.CommandLineTool.main(CommandLineTool.java:162)

This is true for both OpenJDK 8 and 11. Would it be possible to integrate a TIFF library with more capabilities?

And more importantly: could you please ensure the tool at least returns with a non-zero exit status, so I won't need to parse the stdout for exception texts?

kba commented 3 years ago

prima-page-to-pdf uses itext5 but even the newer itext7 does not support > 8 bit per channel AFAICT https://github.com/itext/itext7/blob/9fadebfb114a7ee9ef9058a8b0c9deec71262210/io/src/main/java/com/itextpdf/io/image/TiffImageHelper.java#L349-L357

chris1010010 commented 3 years ago

This can only be dealt with in itextpdf. Each addPage has a try/catch. I guess we could add a cmd argument to stop on exception /**

bertsky commented 3 years ago

This can only be dealt with in itextpdf. Each addPage has a try/catch. I guess we could add a cmd argument to stop on exception

/**
* Adds the document page image to the current PDF page (spanning the whole page).
*/
private void addImage(String filepath, PdfWriter writer, Document doc, Page page) 
             throws MalformedURLException, IOException, DocumentException {
    PdfContentByte cb = writer.getDirectContentUnder();
    cb.saveState();
    Image img = Image.getInstance(filepath);
    img.setAbsolutePosition(0f, 0f);
    img.scaleToFit(page.getLayout().getWidth(), page.getLayout().getHeight());
    cb.addImage(img);
    cb.restoreState();
}

I don't understand that TBH, but IMHO the program should stop with non-zero exit status by default if anything goes wrong (on any page). Perhaps some --keep-going option (for trying all pages regardless) can be useful under certain circumstances, but the normal expectation is that this just converts everything (not just something or nothing despite the success retval).