dagwieers / asciidoc-odf

ODF backend for AsciiDoc
98 stars 26 forks source link

Implement ODF v1.2 backend #7

Closed dagwieers closed 13 years ago

dagwieers commented 13 years ago

Given the fact that ODF v1.2 has now been approved, and apparently has been implemented for years now in every ODF-supporting application (except Microsoft Office 2010) according to:

http://listarchives.libreoffice.org/global/users/msg11764.html

There are four options to consider:

However I don't know if an odt10 backend makes any sense (ever), and odt11 is only useful for Microsoft Office 2010 which is of limited use. If we don't have any need, and no interested resources we might simply stick with odt12 only.

elextr commented 13 years ago

Hi Dag,

Well 1.2 isn't actually a standard yet, it just went to ballot last month see http://lists.oasis-open.org/archives/tc-announce/201109/msg00005.html.

At the stage we are at I don't think it makes much difference :)

Except I was looking for table:title which is added in 1.2.

So we might as well use 1.2 and as far as I'm concerned who cares about old versions or MS.

Cheers Lex

dagwieers commented 13 years ago

Well 1.2 isn't actually a standard yet, it just went to ballot last month see http://lists.oasis-open.org/archives/tc-announce/201109/msg00005.html.

At the stage we are at I don't think it makes much difference :)

Well, v1.2 was just voted on Friday and now is an OASIS standard. It hasn't been approved by ISO yet, but that's probably going to be a formality (although it might require some errata).

Except I was looking for table:title which is added in 1.2.

My LibreOffice was already doing 1.2 and everything except Microsoft Office is already implementing the v1.2 draft. I couldn't find a definitive list of what has changed in v1.2, but in v1.2 a lot of the namespaces still refer to v1.0 and v1.1 anyway.

So we might as well use 1.2 and as far as I'm concerned who cares about old versions or MS.

Indeed :)

elextr commented 13 years ago

[...]

My LibreOffice was already doing 1.2 and everything except Microsoft Office is already implementing the v1.2 draft. I couldn't find a definitive list of what has changed in v1.2, but in v1.2 a lot of the namespaces still refer to v1.0 and v1.1 anyway.

See, http://docs.oasis-open.org/office/v1.2/cs01/OpenDocument-v1.2-cs01-part1.html#__RefHeading__1420418_253892949 most of it doesn't affect us any, except for table as I said.

So we might as well use 1.2 and as far as I'm concerned who cares about old versions or MS.

Indeed :)

Agreed then,.

Anyway lets get something working then worry about any other versions. We are both using the same Libreoffice (3.4) so thats ok.

As you said previously, the next thing is to define the style names/classes. I think they should match the CSS ones as much as possible so its easier to manage multi-backend style consistency. I'll try and post a list soon.

Cheers Lex

elextr commented 13 years ago

As threatened, my first go at defining what classes are used in xhtml.

Sadly it still uses some ids and tags for style control.

Maybe you can fill in the style names to be used in ODT.

Cheers Lex

Asciidoc Styles

XHTML Classes as of 8.6.6

[cols="1,1,5"] |========= ^e| XHTML Class ^e| ODT style ^e|Use | mathblock | | Asciimath or Latexmath blocks | content | | Included elements, images, math, listings, literal blocks, sidebars, open blocks, quote blocks, verse blocks, example blocks, admonition blocks | title | | Titles such as captions etc | image | | Images | imageblock | | Images in block context | footnote | | Footnotes | footnoteref | | References to footnotes | comment | | For comments when included | ulist | | Bulleted list | olist | | Numbered list | dlist | | Labelled list body & glossary | hdlist1 | | Heading for Labelled list item (default strong) | hdlist | | Horizontal labeled list body | hdlist2 | | Horizontal labeled list body? | compact | | Margins for text (lists and paragraph) | qlist | | q and a list | colist | | callout list | paragraph | | Default text paragraph | listingblock | | Listing blocks | literalblock | | Literal blocks | sidebarblock | | sidebars | openblock | | open blocks | quoteblock | | quote blocks | attribution | | citations | verseblock | | verse blocks | exampleblock | | example blocks | admonitionblock | | admonition text | table | | table text | header | | table header row | tableblock | | the whole table | float | | floating title | sectionbody | | body of a section | sect1 | | section level 1 wraps sectionbody | sect2 | | section level 2 wraps sectionbody | sect3 | | section level 3 wraps sectionbody | sect4 | | section level 4 wraps sectionbody | {doctype} | | wraps whole body, book, article, manpage |=========

Xhtml ids and tags used to control style by CSS

[cols="1,1,3"] |========== ^e| Tag or Id ^e| ODT style ^e| Use | h1-6 | | Title font, colours margins, borders | thead #toctitle #revnumber #revdate #revmark | | Title font | #author | | Title font and colour size and bold | #footer | | Title font, size, border, padding | #footer-text | | float padding | #footer-badges | | float, padding | #preamble | | margins | hr | | Rule | p | | margins colours | ul ol li | | margins colours list styles, qualifier on some list classes | dl dt dd | | margins colours, qualifier on some list classes | body | | margins | a | | links | em | | emphasis | strong | | strong | td | | margins, verse space | tfoot | | weight | #footnotes | | margins | tt | | monospace font | thead | | font |==========

Specific markups (mostly quotes and table styles) that are also styles

|=========== | emphasis | strong | monospaced | superscript | subscript |===========

These don't include any deprecated things.

dagwieers commented 13 years ago

Given that LibreOffice changes the style-names, there might be a case for using the LibreOffice style-names instead. Otherwise saving and merging your styles from LibreOffice may fail to work. I need to look into this deeper to understand what is happening there.

Also I found two additional issues that I cannot explain. Even though the paper dimensions match A4, LibreOffice on opening the flat ODT makes it Letter format. When using the flat ODT through UNO (for non-interactive conversion) it opens it as a text-file and not and ODF file (resulting in plain XML content). I guess all three issues require a discussion on the LibreOffice forums.

PS I will attend the LibreOffice conference next week in Paris and have a presentation about unoconv. I might talk a bit about asciidoc-odf too :-)

elextr commented 13 years ago

On 3 October 2011 20:45, Dag Wieërs reply@reply.github.com wrote:

Given that LibreOffice changes the style-names,

Looking a a file created from asciidoc, loaded and saved from lo

It appears that lo generates style names from the display names, note the style Paragraph_20_Padding is made from a display name Paragraph Padding (20 is ASCII for space) that you specified for the paragraph_padding style.

Kind of makes sense since the lo users can create styles but can only specify display names, not actual XML compliant names.

there might be a case for using the LibreOffice style-names instead.

I don't think so, the point of this backend is to generate lo formatted for asciidoc, not to follow the lo styles. So its better if the styles we use are different. But we probably could use slightly more user friendly names than some in the xhtml, but they should be valid XML and don't specify separate display names to stop lo changing them.

Otherwise saving and merging your styles from LibreOffice may fail to work. I need to look into this deeper to understand what is happening there.

Should work, I can create a style called say blah in lo and it shows up fine in the output and applies to paragraphs in asciidoc output when I copy it. But interactive template management in lo is crap, maybe pass that on to them next week.

Also I found two additional issues that I cannot explain. Even though the paper dimensions match A4, LibreOffice on opening the flat ODT makes it Letter format. When using the flat ODT through UNO (for non-interactive conversion) it opens it as a text-file and not and ODF file (resulting in plain XML content). I guess all three issues require a discussion on the LibreOffice forums.

That will be because nowhere in the file are you mentioning a4 (at least using the version I got yesterday, you may have improved it since then) so it is going to default. For me it makes it a4 even though it isn't mentioned, so why is your default paper size letter? :)

Dunno about the UNO. On the other hand I wouldn't worry about it for now, lets get output documents working properly first.

PS I will attend the LibreOffice conference next week in Paris and have a presentation about unoconv. I might talk a bit about asciidoc-odf too :-)

No pressure then...

Cheers Lex

elextr commented 13 years ago

[...]

Also I found two additional issues that I cannot explain. Even though the paper dimensions match A4, LibreOffice on opening the flat ODT makes it Letter format. When using the flat ODT through UNO (for non-interactive conversion) it opens it as a text-file and not and ODF file (resulting in plain XML content). I guess all three issues require a discussion on the LibreOffice forums.

Dunno about the UNO.  On the other hand I wouldn't worry about it for now, lets get output documents working properly first.

Just tried the uno module in python, not understanding anything I just copied the incantations from the docs, changing the pathnames.

loadComponentFromURL("file:///home/lex/file_from_asciidoc.odt","_blank",0,()) loaded the document properly, ie as a formatted file not raw xml.

And can write a nice pdf of it.

This is good because that means that the a2x script can run things without external help so long as the uno module is available and AFAICT its part of the lo/oo distribution.

Cheers Lex

elextr commented 13 years ago

[...]

And can write a nice pdf of it.

PS the PDF was a4 too :)

This is good because that means that the a2x script can run things without external help so long as the uno module is available and AFAICT its part of the lo/oo distribution.

Cheers Lex

dagwieers commented 13 years ago

Hmmm, I need to investigate what happens on my side then. Will keep you posted.

I am surprised I didn't understand that the Foo_20_Bar labels were simply the display-name. This is perfect, because if we do not provide style-names ourselves, LibreOffice should be using the style-names and not the display-names. I will test this later. Personally I don't mind that much what names are visible from LibreOffice as long as it is functional/clear.

BTW ODF syntax has some understanding of style-classes but I have no idea how that is used and whether we can use that in a similar way as CSS. Still needs to be investigated. Also the parent-style relation is something we can take advantage of, however we need to be careful that it still offers the flexibility people may need (even when the default asciidoc style doesn't need it). This is going to be a big balancing exercise between functionality and simplicity...

dagwieers commented 13 years ago

WRT. the uno distribution. Beware that there is an issue using the distribution python together with the shipped pyuno. If LibreOffice is shipped by the distribution and has been compiled against the same python version as the distribution, then it should work. However LibreOffice often requires the latest python, and some users install the lastest LibreOffice from the vendor site (as is what I do) and in that case the system python's fails to work with the LibreOffice pyuno, you have to set a bunch of environment variables and need to run the LibreOffice python instead. unoconv has a few of these workarounds to offer a tool that works in various situations were this is not guaranteed, and supports Mac and Windows too. It's not impossible to adapt a2x for this though, but expect troubles :-)

dagwieers commented 13 years ago

Removing all display-names causes style-names like eg. sect_default to suddenly have display-name sect_2f_default. So we have to be careful about using underscores in style-names. Dashes are not being replaces !

elextr commented 13 years ago

A very preliminary proof of concept for the a2x extension is in the zipped directory. I added a para to README to describe.

It seems to work for me.

Cheers Lex

elextr commented 13 years ago

The styles seem to be approaching the asciidoc defaults well.

Cheers Lex

elextr commented 13 years ago

On 4 October 2011 14:54, Lex Trotman elextr@gmail.com wrote:

A very preliminary proof of concept for the a2x extension is in the zipped directory.  I added a para to README to describe.

It seems to work for me.

This has now been replaced with a system that is integrated into a version of a2x that has been modified to take backend plugins, see the README

Now we need a default template (.ott) document that matches the styles. This can also be a document (.odt), but it must be a zipped document not a flat one.

Cheers Lex

Cheers Lex

dagwieers commented 13 years ago

That looks great. A plugin-architecture is definitely needed to move the ODF specific logic into its own files.

BTW The A4/Letter (and page-style) issues I have identified are related to the footnote implementation. I am tearing out my hair over this one :-( The ODF specification does not allow to define the paper format as a name, it is related to the dimensions in the page-style. However, that is not correctly done on my system :-/ Eventually I found that if the page-layout was part of the automatic-styles, it would work correctly, if it was part of the styles, it fails the way it does using Footers. Seems a genuine bug to me.

dagwieers commented 13 years ago

I am going to close this ticket, because we now have an ODF v1.2 stylesheet. If ODF v1.1 or ODF v1.0 is going to be requested, we can open a new issue for this.

BTW We should avoid having OT discussions on the issues :-) I know the email-integration is nice, but also dangerous to be abused as a mailinglist :-D