Open PhilterPaper opened 1 year ago
Thanks Phil, I think if I had to pick one then this would be it:
<pre>, <nobr>, <br>, <center>* formatting control
I should probably list here what's in the current (3.025) release for Markdown (md1) and HTML (html) support, so you don't have to go searching through the code:
<i>
and <em>
tags (Markdown _, *) as italic font style<b>
and <strong>
tags (Markdown **) as bold font weight<p>
tag (Markdown empty line) as a paragraph<font face="font-family" color="color" size="font-size">
as selecting face, color and size<span>
needs style= attribute with CSS to do anything useful<ul>
tag (Markdown -) unordered (bulleted) list. type to override marker supported<ol>
tag (Markdown 1.) ordered (numbered) list. start and type supported.<li>
tag list item. value to override ordered list counter, and type to override marker type supported<a href="URL">
tag (Markdown []()
) anchor/link, web page URL or this document target #p[-x-y[-z]]<h1>
through <h6>
tags (Markdown # through ######) headings<hr width="length" size="length">
tag (Markdown ---) horizontal rule. currently no align property (left only)<s>
, <strike>
, <del>
tags (Markdown ~~) text line-through<u>
, <ins>
tags text underline<blockquote>
tag (Markdown >) indented both sides block of smaller textNumbered (decimal and hexadecimal) entities are supported, as well as named entities (e.g., —
). Both lists get a "gutter" (for the marker) of marker_width
points wide, so it's consistent over the call.
Note that the default CSS applies to Markdown, unless you give a style => entry to the column() call to revise the CSS.
In HTML, you can define <style>
tags, but caution: these are pulled out into a global style block (cumulative and global, as though they had all been given in the <head>
), applied after the CSS property defaults are defined and then any column()
global style => 'CSS list'
has been applied.
CSS Selectors are very primitive:: a simple tag name (including body), such as ol; a class name such as .error; or an ID such as #myID. There are no hierarchies or combinations supported (e.g., nothing like p.abstract or li > p). The (decreasing) order of precedence follows a browser's: in a style = attribute, as a tag attribute (which may have a different name from the CSS's), an ID, a class, or a tag name. Comments / and / are NOT currently supported in CSS.
There are a number of global settings either required or available for tuning the behavior of column()
. In the parameter list you can set
<font size>
. Initially 12.2 * <font size>
.[ <font size>, 0 ]
.<style>
tags, Initially ''.That's all I can think of at the moment. Remember that the Markdown converter (Text::Markdown) may produce HTML that this system cannot yet handle. And of course, there's plenty of HTML and CSS that can not now (and may never be) handled, but you can certainly request support.
My intent is to keep this list synchronized with Docs.pm (POD) and the Content::Text POD.
Some more ideas I'm kicking around...
pod2html
or the equivalent CPAN library to convert the POD in a Perl file to HTML, and then process it in the usual way. Problems are that links within the HTML will be to HTML files, and need to be converted to PDF file destinations, and a user may want to add navigation links (see docs/buildDoc.pl
) before conversion. So, it may be better to convert your POD to HTML externally, and just treat the resulting HTML like any other.#id
label) to see if it's ".pdf".#id
URL. This will require resolving an id to a page and x/y, which means at least two passes, at least if the page number needs to be part of the link text. This should be considered as part of a more general TOC/index/cross reference/footnote/index system. Some Markdown flavors create a rather long and clumsy id for each heading -- I'm not sure if there's a way to specify an id= someplace in a Markdown document like you can in HTML. I'm not sure I even want to think about cross-document targets (might mean generating all documents in one go!).Markup could use some (non-standard) HTML tags to
I doubt that PDF::Builder's column markup will ever be able to handle a full HTML input, à la "Prince", nor Javascript, but if it can handle a large subset of HTML, we could end up with a decent general-purpose HTML-to-PDF converter. Then, with additional extensions to HTML, we can do a decent book layout. Some actions might be simplified by allowing Perl routines to be defined, rather than defining complicated tags with many options. Also flag any unsupported HTML tag or CSS that it encounters, to alert users that something wasn't processed.
Looking through the troff manual, I get the idea for the following CSS:
Example:
<h5>This is a level 5 heading</h5>
<p>This is the paragraph it pertains to...</p>
would produce
This is a level 5 heading: This is the paragraph it pertains to...
Assuming the CSS for <h5>
includes _heading_suffix: ': ' _suppress_nl
, and is bold at the same font-size. Note that an extra space will be automatically added.
A more general _content_before
and _content_after
, or even CSS content
with some form of before
and after
(similar function to the ::before and ::after psuedo-elements in CSS) might be better. This could be used with marker before and after (e.g., put parentheses around an ordered list counter) and the marker-specific _marker-before
and _marker-after
could be phased out (deprecated). However, this would require separate handling for the marker and list item text, so it may be better to keep explicit "marker" versions, unless there is a clean way to group "marker" CSS together and keep it separate from what's applied to the list item text. Caution: do NOT inherit any sort of _content_before
and _content_after
, as you want to strictly control what it applies to.
Eventually, heading text/level and any explicit id's will be collected for generating a Table of Contents, cross-references, etc.
In the same manner as specifying that the next line-end be suppressed, we could specify the suppression of any paragraph indentation. This way, with a section heading (not run-in in this case) could cause the first paragraph in the section to not be indented, rather than having to do explicit markup for this.
After further perusal of the troff manual, I see a number of items that could be useful in typesetting. While troff is a very powerful system, including facilities for tables, graphs, equations, picture drawing, etc., I don't see it being used very much "in the wild". Therefore, I do not plan at this time to implement troff input (nor its close cousin, man page input, although that is somewhat more widely used). There does not seem to be any Perl module to translate troff into something else, such as HTML, although groff with HTML output (grohtml) may be a viable alternative. If some party would find troff input very useful for PDF::Builder, they are welcome to sponsor such work, or build a good troff-to-HTML translator package that could be used here (possibly with SVG output for pic, eqn, grap, etc.). Note that SVG support (as well as eqn-to-SVG) is already in plan for PDF::Builder.
That said, here are some additional features for column()
, many inspired by troff, that could be implemented:
<block>
HTML extension, or some manner of "subcolumn" within the column()
system? Implementing a <div>
might be even better, with a flag to indicate whether it can be split across pages (default "no").column()
input may be applied in short segments, requiring the author to keep track of the starting number for each type (unless we keep global counters outside of column()
).column()
: "Slide Mountain is ".elevation(4218)." tall.", which converts and rounds/formats appropriately.white-space
is set appropriately (to honor spacing), tabs can be handled. The author will want to be able to specify the tab stops from the left edge (min x) of the column. If using constant width text, character count will work, but for proportional fonts, absolute distance measures (in, cm, mm, pc (Pica, 1/6 in), pt (Big or PostScript point, 1/72 in), pp (Printer's Point, < 1/72 in, although there is no worldwide standard size), el (Elite, 1/12 in) and more (em, en, ex) should be usable. Of course, such units should be usable wherever distances are given. Note that tabbing is often better done with a table, if the intent is to create subcolumns of text. For tables, add an "alignment" setting so that items within a cell can be aligned on, e.g., the decimal point. See CSS tab-space
, and default tabstops of every 8 characters.CSS "white-space" with settings:
<br>
A library to consider for further use: Text::Markup. Note that it does not natively translate many other formats, but appears to be a front-end for a lot of other translator packages. If nothing else, it is a pointer to useful other packages.
Here are the formats currently supported:
The only major formats missing from the list are troff and the closely related nroff/man, and of course, LaTeX. As I said earlier, groff may be suitable to convert the former to HTML. I don't see any point in converting (La)TeX to HTML, as excellent PDF output already exists for this family of markup (there are also good HTML output converters, should you want to embed some LaTeX-source documentation within a larger PDF). Pod conversion makes use of Pod::Simple::XHTML
, which is already recommended for use in building HTML documentation for PDF::Builder (called by docs/buildDoc.pl
).
I can't see at this point, it being worthwhile to directly support any of these, or Text::Markup
itself, as few would make good document inputs. You might want to embed the documentation for some program (as Pod, man, etc.), so there's certainly nothing to prevent you from running a converter (e.g., Text::Markup) externally and then dealing with the resulting HTML, which column()
can import. Don't forget to do something about internal links, which may be expecting to link to HTML documents, if those target documents have also been converted to PDF.
In addition to <eqn>
(display and inline), consider a form of those tailored for chemistry (<chem>
and <dchem>
). This would use the MathJax markup after modifying the user input source to use Roman/normal weight for text. It would need to be an improvement over explicitly using super- and sub-scripts to be worth the effort.
The 3.025 release, containing
column()
, is far from the end of the job! I don't know how quickly I'll be able to get to the things on this list, but I'd like to do them all. PRs to take care of individual items would be appreciated, of course (and sponsorship of new features would be even better!). Most of these items can be implemented in any order, so don't be afraid to request priority for some things, or to help out.Items flagged with a star * are not official HTML or CSS, but would be extensions.
Perhaps in the 3.027 release?
(I had hoped to get at least some of these in 3.026, but they didn't make it due to fixing other stuff. Hopefully 3.027.)
<P>
would fail).<img>
, including SVG, support<pre>
,<nobr>
,<br>
,<code>
`)&NBSP;
like space, but force NOBREAK.<cite>
,<q>
,<kbd>
,<samp>
,<var>
font control.<eqn>*
equation setting (via MathJax) and data plotting (via GnuPlot)<dl>
,<dt>
, and<dd>
.&SHY;
. See also full Knuth-Liang below.<hr>
align,<sup>
and<sub>
.<center>
* formatting control.<big>*
,<bigger>*
,<smaller>*
,<small>
font sizing.Possibly...
<base>
,<wbr>
. Note that<abbr>
is dynamic in HTML (on mouseover), so this might not be feasible with PDF.<abstract>*
,<article>
,<aside>
,<section>
, etc. as predefined page areas?Extensions to HTML and CSS...
<sl>*
simple list (like<ul>
, but no markers).<sc>*
(Small caps) and CSSfont-variant: small-caps
preprocess: around runs of lowercase put<span style="font-size: 80%; expand: 110%">
and fold to UPPER CASE. This would be after@mytext
creation, inserting a series of<span>
tags.<pc>*
(Petite caps) and CSSfont-variant: petite-caps
* like<sc>
, but 1ex font-size, expand 120%.<dc>*
(Drop caps). Besides the giant single (usually) letter, we need to indent multiple lines.<ovl>*
overline (similar to underline, for completeness) using CSStext-decoration: overline
.<k>*
kern text (shift left or right) with CSS_kern*
, or general positioning, e.g., to form a logo such as (La)TeX through character positioning. What to do at the HTML level? x +/- % of font size, y +/- % of font size. To do effects such as notations like<sup>4</sup><sub>2</sub>He
, perhaps<hkeep align="right"><sup>4</sup><sub>2</sub></hkeep>He
notation, or a more general purpose<vstack>
tag? Remember to keep "He" (in this example) with it as an unsplittable unit. Possibly, use of<eqn>
will do the job instead.<vfrac>*
vulgar fraction m/n, using<sup>, <sub>,
and kern<vfrac num="1" denom="2">
. Some of these things may overlap too much with<eqn>
processing (see below) to be worth doing separately.</sc>
implied after the earlier of X words or the end of a line (with a complete, unhyphenated last word). Something like<sc eol="end">In The Beginning</sc> was inspired by baseball.
If a complete "Beginning" cannot be fit on the line, Small Caps would end at "The".<endc>*
force early end of column here (at this y, while still filling line), e.g., to prevent a widow. Optional conditional (e.g., less than 1" of vertical space left in column). By default, forbid hyphenation, since this is at the end of a column.<vkeep>*
material to keep together vertically, such as headings and paragraph text.Release 3.028 or later?
<center>
and CSS text-align: center may need this. Since a given line might be a composite of several sub-lines, this might best be done by writing out the full line, and then backpatching x displacements.<lang>
tag to mark a section with a different language with different hyphenation rules (might integrate with bidirectional flag).<lang>*
define language of a span of text, for hyphenation or audio purposes. Possibly also as a new attribute for other tags such as<span>
. Need to see what HTML and CSS already define.<hyp>*
,<nohyp>*
control hypenation in a word (and remember rules when see this word again) -- basically, give dictionary entry for splitting up some word, overruling whatever Knuth-Liang would do.<bdi>
and<bdo>
support. This may require a wrapper around HarfBuzz::Shaper, unless its owner is willing to extend it.<nolig>*
and</nolig>*
forbid ligatures in this range of text.<lig gid='nnn'>*
and</lig>*
replace character(s) in child text by a ligature, if available. Highly font-dependent.<alt gid='nnn'>*
and</alt>*
replace character(s) by alternate glyph such as a swash or alternate presentation form. Highly font-dependent.This list is not set in stone, and there are no guarantees that I'll ever get to all of them. As I said, help with this would be much appreciated. Also, further features that you would find useful could be suggested.