StimVinsh / xdocreport

Automatically exported from code.google.com/p/xdocreport
0 stars 0 forks source link

Conversion ODT->PDF ignores text sections and tabulators #69

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When I create text sections in ODT document, it is ignored while converting to 
PDF. Looks like there is no implementation in StyleEngineForIText for this kind 
of styling. It is now impossible to divide page into sections and make some 
text floating on the page in desired position. Tabulators are not working 
either.

Original issue reported on code.google.com by rces...@gmail.com on 19 Jan 2012 at 2:15

GoogleCodeExporter commented 8 years ago
Hi,

Could you attach a simple odt with your problem please? If you want contribute 
to fix this problem, you are welcome!

Regards Angelo 

Original comment by angelo.z...@gmail.com on 19 Jan 2012 at 11:58

GoogleCodeExporter commented 8 years ago
Issue assigned to Leszek, but for the moment we don't know what is a problem. 
We are waiting for an ODT file sample.

Thanks.

Original comment by angelo.z...@gmail.com on 20 Jan 2012 at 1:37

GoogleCodeExporter commented 8 years ago
Hi,
here it is. The attached file is our company invoice template. This is the very 
same same document, which caused the issues. I have only deleted some company 
info like names, numbers etc.

Try to parse the ODT file through the iText and you will get almost unformatted 
PDF.

There are also some issues with character encoding (document is written in 
Czech language), but that may be solved by using CP1250 encoding.

Regards Richard

Original comment by rces...@gmail.com on 20 Jan 2012 at 2:53

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Richard,

Thanks a lot for your attached odt. We will try to see the problem and how to 
fix that (if you want contribute, don't hesitate!)

For your encoding problem, Leszek has fixed this problem by customizing the 
iText encoding : 

-------------------------------------
PDFViaITextOptions options = 
PDFViaITextOptions.create().fontEncoding("windows-1250");
-------------------------------------

Please see that at http://code.google.com/p/xdocreport/wiki/ODFDOMConverter 

Regards Angelo

Original comment by angelo.z...@gmail.com on 20 Jan 2012 at 3:11

GoogleCodeExporter commented 8 years ago
Hi Angelo,
thanks for noticing this. I have already found this solution in some other 
issue here on website. Anyway I have found the missing function for parsing 
tabulators. Trying to fix it right now ...

Original comment by rces...@gmail.com on 20 Jan 2012 at 3:21

GoogleCodeExporter commented 8 years ago
Ok, 

I have added a new JUnit TestCase 
http://code.google.com/p/xdocreport/source/browse/thirdparties-extension/org.odf
toolkit.odfdom.converter/src/test/java/org/odftoolkit/odfdom/converter/AbstractO
DFDOMConverterTest.java which is used to generate HTML and PDF by using teh 
attached "faktura.odt" that I have renamed to TestTextSectionsAndTabulators.odt.

I see the problem. 

@Leszek : if you can try to fix the problem, it's really cool. If you have not 
time, tell me, I will try to study the problem.

@Richard: if you have some directives, some pieces of codes, any contribution 
are welcome!

Regards Angelo

Original comment by angelo.z...@gmail.com on 20 Jan 2012 at 4:35

GoogleCodeExporter commented 8 years ago
Hi
I looked at unzipped example xml.

It seems that odt->pdf converter does not handle style construction like:
<style:style style:name="Sect1" style:family="section">
  <style:section-properties text:dont-balance-text-columns="false" style:editable="false">
  <style:columns fo:column-count="2" fo:column-gap="0cm">
  <style:column style:rel-width="32767*" fo:start-indent="0cm" fo:end-indent="0cm" /> 
  <style:column style:rel-width="32768*" fo:start-indent="0cm" fo:end-indent="0cm" /> 
  </style:columns>
  </style:section-properties>
  </style:style>

so it requires at least 3 new *Stylable classes and proper handling in 
StyleFactory

than text is rendered with construction like this:
<text:section text:style-name="Sect1" text:name="Sekce1">
  <text:p text:style-name="P10">Dodavatel:</text:p> 
  <text:p text:style-name="P10" /> 
  <text:p text:style-name="P13">Odběratel:</text:p> 
  <text:p text:style-name="P10">${invoice_customer_name}</text:p> 
  <text:p text:style-name="P10">${invoice_customer_address}</text:p> 
  <text:p text:style-name="P10">${invoice_customer_city}</text:p> 
  <text:p text:style-name="P10">IČ: ${invoice_customer_ic}</text:p> 
  <text:p text:style-name="P10">DIČ: ${invoice_customer_dic}</text:p> 
  </text:section>
This test is automagically splitted into two columns. The converter does not 
handle text:section so again a new class which translates it to iText must be 
introduced. For now I have no idea how to do it - maybe an invisible table???

So, probably it may be fixed, but it will require consulting of ODF spec (what 
these construction means), finding out how to translate it to iText and quite a 
lot of coding. It will take time.

Richard, if you don't want to wait I suggest you to use invisible tables to 
format the text. It will work properly. You can also set the font encoding to 
windows-1250 and czech letters will be translated properly.

Btw: I wonder how to make these text sections using open office editor?? I 
wasn't even aware that such things are possible!

Regards
Leszek

Original comment by lesz...@safe-mail.net on 20 Jan 2012 at 7:46

GoogleCodeExporter commented 8 years ago
Hi Leszek,

Thanks for your investigation. I think your idea about creating iText table is 
nice solution. I will do the same thing than table. 

I will modify StyleEngineForIText to get info style:section-properties, 
style:columns :

-------------------------------------------------------
@Override
    public void visit(TextSectionElement ele) {
        //Do compute style
    }
------------------------------------------------------

Anf after I will modify the ElementVisitorForIText to create iText Table like 
this : 
-----------------------------
@Override
    public void visit(TextSectionElement ele) {
        float[] columnWidth = // search widths column from the styles computed.
        StylableTable table = document.createTable(currentContainer,
                columnWidth);
        super.visit(ele);
        try {
            table.setTotalWidth(columnWidth);
        } catch (DocumentException e) {
            // Do nothing
        }
        applyStyles(ele, table);
        addITextContainer(ele, table);
    }
----------------------

It's just an idea. Hope that will help you.

Regards Angelo

Original comment by angelo.z...@gmail.com on 21 Jan 2012 at 9:04

GoogleCodeExporter commented 8 years ago
Thanks Angelo. Your changes are good for start.

Original comment by lesz...@safe-mail.net on 22 Jan 2012 at 10:35

GoogleCodeExporter commented 8 years ago
Hello

I'm working on issue 69. ODF->PDF converter ignore sections in converted 
document
I've commited new classes StyleSectionProperties, StyleColumnsProperties, 
StyleColumnProperties which remember styling information.
They are filled in modified StyleEngineForIText. That was the easy part.

Now comes harder part. <text:section element may contains sub-elements like 
text paragraphs, tables, images etc.
The problem is that these elements should automatically be balanced across 
columns.
In example let section has three columns and 7 lines of text. In such case 3 
lines should be put in first column, 3 in second and 1 in third.
This may be additionally complicated if a text paragraph has 
break-before="column" attribute or section elements has different heights.
This algorithm seems rather complicated. Do you have some hints regarding this 
problem?

Btw. Class StylebleSection already exists. But it is not used for section 
(text:section) handling but indirectly for text:h element.
Maybe it should be ranemed?

Regards
Leszek

Original comment by angelo.z...@gmail.com on 27 Jan 2012 at 8:51

GoogleCodeExporter commented 8 years ago
Hi Leszek,

Thanks for creating styles structures. To manage text:section I will do like 
this : 

1) Create StylableTable when TextSectionElement is visited

When TextSectionElement is visited StylableTable is created and it stores 
currentTextSection to know if current element visited  (ex:text:p) belongs to a 
text:section.

----------------------------------------
    private int nbColumns=0;
    private TextSectionElement currentTextSection;

    @Override
    public void visit(TextSectionElement ele) {
        float[] columnWidth = new float[] {100, 100};
        nbColumns = 0;
        StylableTable table = document.createTable(currentContainer,
                columnWidth.length);
        currentTextSection =ele;
        try {
            table.setTotalWidth(columnWidth);
        } catch (DocumentException e) {
            // Do nothing
        }
        applyStyles(ele, table);
        addITextContainer(ele, table);
        currentTextSection = null;
    }
----------------------------------------

2) Generate or get the current TableCell when text:p (or another element?) is 
visited

----------------------------------------
@Override
    public void visit(TextPElement ele) {
        addTextSectionIfNeeded(ele);
....
----------------------------------------

addTextSectionIfNeeded will looks liek this but it must improved :

-------------------------
private void addTextSectionIfNeeded(TextPElement ele) {
        if (currentTextSection != null && !currentTextSection.equals(ele) && nbColumns < 2) {
            nbColumns++;

            StylableTableCell tableCell = document
                    .createTableCell(currentContainer);

            // table:number-columns-spanned
            Integer colSpan = 1;
            if (colSpan != null) {
                tableCell.setColspan(colSpan);
            }
            // table:number-rows-spanned
            Integer rowSpan = 1;
            if (rowSpan != null) {
                tableCell.setRowspan(rowSpan);

            }
            // Apply styles coming from table-row
            if (currentRowStyle != null) {
                tableCell.applyStyles(currentRowStyle);
            }
            // Apply styles coming from table-cell
            //applyStyles(ele, tableCell);
            //addITextContainer(ele, tableCell);
            currentContainer=tableCell;

        }

    }
-------------------------

It(s just an idea. Hope it will help you.

For StylableSection I prefer keep this name because it extends iText Section. 
If you need this name, because you could use StylableTextSection ?

Regards Angelo

Original comment by angelo.z...@gmail.com on 27 Jan 2012 at 10:23

GoogleCodeExporter commented 8 years ago
Hi Angelo

I,m afraid you function will not help much. It seems add two first paragraphs 
to two separate cells. The problem is you don't know in advance to which cell 
put the sub-elements

The algorithm for section layout loooks more or less like this:
1 Gather ALL elements of a section (paragraphs, images, tables...)

2 Balance them between columns to have lowest possible height but:
2a - paragraph may be splitted between columns something in the middle!!!
2b - if section is bigger than single page then text fills left column than 
right columns, on next page situation repeats - text start with left column
2c - column breaks complicates situation even more as they alter the flow.

So for now I simply don't know iText equivalent how to do this. Table won't 
help much because you have to put an element into concrete cell and it will not 
change its position as required. I looked at ColumnText class, maybe it may be 
used but it requires a lot of coding around. The problem is much more 
complicated than it looks for the first time.

Original comment by abe...@gmail.com on 27 Jan 2012 at 1:26

GoogleCodeExporter commented 8 years ago
Hi Leszek,

I'm sorry that my quick code doesn't help you and it seems that it's more 
complicated that I have though-( I'm sorry Leszek, I cannot help you for the 
moment, I'm very busy with text styling topic and ODT macro.

Good luck!

Regards Angelo

Original comment by angelo.z...@gmail.com on 27 Jan 2012 at 1:42

GoogleCodeExporter commented 8 years ago
While working on this issue I encountered problem with master page handling.
You can see this problem in updated TestLandscapeFormat.odt

Headers and footers are independent for every master page.
If you set headers or footers in a document, they are properly converted for 
default master page.
When master page changes headers and footers should be changed accordingly.
But they are still remains the same like in a default master page, or they 
reverts back to default haeaders and footers after one page.

The reason is that in ExtendedHeaderFooter there is a master page stack. On 
every page it pops actual master page and use it. So after one page it reverts 
back to default. It is incorrect.
I think master page should be set and active until another master page is 
activated. No master page stack is necessary, only current master page.

Angelo, could you explain the reasons behind this master page stack?

Original comment by abe...@gmail.com on 7 Mar 2012 at 1:35

GoogleCodeExporter commented 8 years ago
Hi Leszek,

I don't remember what I do that. The only test which is done is 
TestLandscapeFormat.odt
. So if you modify the code and TestLandscapeFormat.odt works, it's really cool.

So don't hesitate to modify ExtendedHeaderFooter  if you need.

Regards Angelo

Original comment by angelo.z...@gmail.com on 7 Mar 2012 at 1:38

GoogleCodeExporter commented 8 years ago
Again a surprise.

I've almost implemented column layout with content balancing (for page and 
section). It is simulated by table.
But StylableChapter and StylableSection cannot be added to a table (exception 
is thrown). This is iText limitation. So some of unit tests fail.

I'm going to replace StylableChapter and StylableSection by StylableHeading
StylableHeading would be a StylableParagraph descendent with numbering (ie 
1.2.1) Numbering would be handled by ITextVisitor.
Angelo - what do you think about it? Is it ok for you?

Original comment by abe...@gmail.com on 15 Mar 2012 at 7:33

GoogleCodeExporter commented 8 years ago
> I've almost implemented column layout with content balancing (for page and 
section). > It is simulated by table.

Cool!

> I'm going to replace StylableChapter and StylableSection by StylableHeading

I believe that I had used StylableChapter and StylableSection to have a PDF 
summary. Do you think it's possible to have a new option to generate 
StylableChapter/StylableSection (like today) or StylableHeading (like you 
suggest). If code is awfull to support the both, use StylableHeading .

Regards Angelo

Original comment by angelo.z...@gmail.com on 15 Mar 2012 at 8:31

GoogleCodeExporter commented 8 years ago
Angelo
What do you mean by "PDF summary"?
As I understand StylableChapter and StylableSection, they create numbering but 
nothing  special besides it. Or maybe I miss something?
Regards
Leszek

Original comment by abe...@gmail.com on 16 Mar 2012 at 8:10

GoogleCodeExporter commented 8 years ago
Hi,

I have attached the screen of the ODTBig.odt.pdf. You can see on the left a 
Summary with Hyperlink to navigate to each title. This PDF Summary is generated 
with iText Chapter.

Regards Angelo

Original comment by angelo.z...@gmail.com on 16 Mar 2012 at 8:31

Attachments:

GoogleCodeExporter commented 8 years ago
It's rather not possible to support both StylableChapter/StylableSection and 
StylableHeading.
But it is possible to simulate Chapter/Section behaviour. They add PdfOutline 
in the background and it makes the summary. I've not implemented it now 
(working header was priority) but maybe sometime in the future.
Regards
Leszek

Original comment by abe...@gmail.com on 19 Mar 2012 at 9:03

GoogleCodeExporter commented 8 years ago
Hi Leszek,

Ok I understand that's it's not possible to support  both 
StylableChapter/StylableSection and StylableHeading. Change the code as you 
wish. If you can support PDF summary (with PdfOutline ), it should bevery cool.

Many thanks.

Regards Angelo 

Original comment by angelo.z...@gmail.com on 19 Mar 2012 at 9:06

GoogleCodeExporter commented 8 years ago
Finally changes regarding this issue are committed.

First of all section handling is implemented - sections may have columns of 
equal or different width. Their content may be balanced or not. Sections can 
handle column break or page break.

Additional improvements:
- main document may have columns too, text fills one column until it is full 
and then goes to new column (or a new page). Main document also handles column 
or page break
- fix for master page handling (when page changes from portrait to landscape or 
the opposite way)
- replace StylableChapter/StylableSection by StylableHeading
- fix for paragraph line height

An example document TestTextSectionsAndTabulators is converted correctly. I've 
also added more complicated example TestTextSectionsAndBreaks with nested 
sections and column/page breaks.

Comparing to earlier fixes I've made this one was really dificult, so I'm glad 
it's done.

Regards
Leszek

Original comment by abe...@gmail.com on 20 Mar 2012 at 3:46

GoogleCodeExporter commented 8 years ago

Original comment by abe...@gmail.com on 20 Mar 2012 at 3:46

GoogleCodeExporter commented 8 years ago
Hi Leszek,

Very good job!!! 

Many thanks.

Regards Angelo

Original comment by angelo.z...@gmail.com on 20 Mar 2012 at 3:53