a collection of small improvements to help resolve some of the manually intensive stuff CT has mentioned:
build.xml and bash script updates so no more hardcoded absolute paths - scripts will fail unless a "work dir" is specified on command line, and everything is done relative to it.
flat vs hierarchical html cleaning now done in distinct dirs to not collied (ultimately we should just pick one and remove hte other)
updated pandoc usage:
run pandoc on both flat and hierarchical HTML (for now)
we now get document title and attributes at top
removed code that was adding <h1> to HTML body (after breadcrumb) since it's no longer needed w/ proper document titling
switch to better header formatting (prefix '='xN instead of various underlinings for each level)
CWIKI TOC macro output detected better then before & <meta> tag added when found
custom pandoc template showing how we can use write data from tags as asciidoc document attributes
<pre> tags with langauge hints from CWIKI are massaged during HTML scrapping so the data is preserved all the way to the asciidoc output
a collection of small improvements to help resolve some of the manually intensive stuff CT has mentioned:
<h1>
to HTML body (after breadcrumb) since it's no longer needed w/ proper document titling<meta>
tag added when found<pre>
tags with langauge hints from CWIKI are massaged during HTML scrapping so the data is preserved all the way to the asciidoc output