Have a look at PdfConverter

ylussaud commented 5 years ago

If it works fine, it would be nice to generate a docx or a pdf according to the file extension of the output document.

ylussaud commented 5 years ago

It is not part of the POI project and need new dependencies:

fr.opensagres.xdocreport fr.opensagres.poi.xwpf.converter.core 2.0.2

ylussaud commented 5 years ago

The converter uses iText which is LGPL that can be an other problem.

ejuliot commented 3 years ago

POI already has a built-in support for DOCX to PDF conversion. Loot at https://stackoverflow.com/questions/43363624/converting-docx-into-pdf-in-java (org.apache.poi.xwpf.converter.pdf.PdfConverter)

ylussaud commented 3 years ago

As stated above PdfConverter is not part of apache POI but fr.opensagres.poi.xwpf.converter.core that support apache POI 4.0.1. M2Doc is using apache POI 4.1.0 and will move to next versions.

ylussaud commented 2 months ago

The LGPL licence is not an issue, there is LGPL code in the Orbit update site. At the moment both M2Doc and fr.opensagres.poi.xwpf.converter.pdf 2.0.0 depend on POI 5.2.3 so I was able to tests the pdf conversion.

There are the following issues:

when a table is present it sometimes throws an NPE:

fr.opensagres.poi.xwpf.converter.core.XWPFConverterException: java.lang.NullPointerException: Cannot invoke "org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTblGrid.getGridColList()" because "grid" is null
at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:71)
at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:39)
at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:42)

To solve this issue we need to add a CTTblGrid tot the created XWPFTable. This implies knowing the width of each column. A width has been added to the MCell (see #472) but I'm not sure we will be able to compute a width when importing from HTML. I'm opening this issue #525.

with asImage test :

java.lang.StackOverflowError
at java.base/java.lang.StringBuffer.<init>(StringBuffer.java:133)
at com.lowagie.text.pdf.BidiLine.createArrayOfPdfChunks(Unknown Source)
at com.lowagie.text.pdf.BidiLine.createArrayOfPdfChunks(Unknown Source)
at com.lowagie.text.pdf.BidiLine.processLine(Unknown Source)
at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
at com.lowagie.text.pdf.ColumnText.goComposite(Unknown Source)
at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
at com.lowagie.text.pdf.PdfPRow.writeCells(Unknown Source)
at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
at com.lowagie.text.pdf.PdfPTable.writeSelectedRows(Unknown Source)
at com.lowagie.text.pdf.ColumnText.goComposite(Unknown Source)
at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
at com.lowagie.text.pdf.ColumnText.go(Unknown Source)
at com.lowagie.text.pdf.PdfDocument.addPTable(Unknown Source)
at com.lowagie.text.pdf.PdfDocument.add(Unknown Source)
at com.lowagie.text.Document.add(Unknown Source)
at fr.opensagres.xdocreport.itext.extension.ExtendedDocument.add(ExtendedDocument.java:114)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.flushTable(StylableDocument.java:374)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.pageBreak(StylableDocument.java:141)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.columnBreak(StylableDocument.java:120)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.simulateText(StylableDocument.java:230)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.pageBreak(StylableDocument.java:160)
at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.columnBreak(StylableDocument.java:120)

some differences between the word document and the pdf document:
- some bullets from bullet list are missing (HTML ul test)
- ...

Overall the output pdf is pretty close to the word document if it don't use MTable.

ObeoNetwork / M2Doc

Have a look at PdfConverter #357