alan-if / alan-docs

Alan IF Documentation Project
https://git.io/alan-docs
Other
4 stars 0 forks source link

PDF Syntax Highlighting #17

Closed tajmone closed 5 years ago

tajmone commented 6 years ago

Task list and progress status of customization of syntax highlighting in PDF documents.

NOTEasciidoctor-fopub uses Apache™ FOP to convert from DocBook to PDF, and XSLTHL for syntax highlighting source code (for more info, see the XSLTHL Wiki).

tajmone commented 6 years ago

Ciao @thoni56,

I've finally managed to work my way trhough the misteries of XSL stylesheets, and succeeded in creating different styles for Alan code examples, BNF rules and other verbatim blocks. I've also managed to add colored background and border with radius (it wasn't enabled in the original templates).

Alan Examples Theme

For testing purposes, I've used the Monokai color scheme — both because I like it and because I had a reusable ready-made template for it.

I've actually implemented in the XSL stylesheet a ful color scheme, based on Base16 variables, so changing the colors is now very easy (otherwise, the attributes that control styling of sourcecode are scattered all over the place, but now they are centralized in a block of variables):

<!-- =============================================== -->
<!-- Monokai Base16 Color Scheme, by Wimer Hazenberg -->
<!-- =============================================== -->
  <xsl:param name="Monokai.base00">#272822</xsl:param><!-- Rangoon Green ( almost black ) -->
  <xsl:param name="Monokai.base01">#383830</xsl:param><!-- Armadillo ( almost black ) -->

<!-- [...] -->

<!-- ================================== -->
<!-- Syntax Highlighting Theme for Alan -->
<!-- ================================== -->
  <xsl:param name="AlanHL.background" select="$Monokai.base00"></xsl:param>
  <xsl:param name="AlanHL.normal"     select="$Monokai.base05"></xsl:param>
  <xsl:param name="AlanHL.quotedId"   select="$AlanHL.normal"></xsl:param><!-- TEST WITH base12 -->
  <xsl:param name="AlanHL.keyword"    select="$Monokai.base08"></xsl:param>
  <xsl:param name="AlanHL.comment"    select="$Monokai.base04"></xsl:param>

<!-- [...] -->

Chossing A Color Scheme

Personally, I love the Monokai scheme, but I now that not everyone likes dark schemes. Anyhow, changing the scheme is not a problem at all, and we can test various schemes easily with the new base16 system.

I think that if it has to be a light scheme it shouldn't be a strong yellow like the current one, because of syntax highlighting colors needing contrast with the backgroun and among themselves (dark schemes provide the best contrast IMO).

Now it's a matter of choosing the final color scheme to adopt.

The ideal way would be to actually test the color schems on Alan syntax using Hihglight tool, which now ships with the Alan syntax and all the Base16 schemes:

Alternatively, here is a link to live previewes of color schemes on Highlight.js (no Alan syntax though):

We could refer to the names of those schemes to communicate and discuss choices.

About the New Alan Highlighter

I had to create a new syntax for Alan in XSLHL. It works pretty well, and have done some stress testing against edge cases and works good (eg., no false positive for keywords inside quoted identifiers).

I'm going to add to the project some test files for syntax highlighting (now PDF, later on HTML too), for both previewing and testing.

The new syntax is almost identical to the Highlight syntax, except that it doesn't capture special dollar symbols in strings (XSLHL doesn't support styling escape sequences or interpolation inside strings).

Also, I didn't add highlighting of predefined classes this time, because I realized that there is a conflict between the actor, container and location classes and their keywords counterparts (ie, actor and location pseudo-attirbutes, as in Curent Actor/Location, and the container property). The highlighter would be unable to distinguish between when these are classes or not; in fact I should remove them from the Highlight syntax too!

I've now understand why the keywords list in the Manual doesn't include some predefined classes — it doesn't actually contain ANY classes, for these are there in virtue of being either pseudo-attributes or properties.

Is it correct? or am I misunderstanding the above point?

thoni56 commented 6 years ago

About the classes, yes, I think of them not as keywords, but as just predefined classes. So in my mind they should have the same treatment as other, author defined, classes (and identifiers).

Actually we should probably change the "syntax" from Current Actor to Current actor, semantically stating that following the Current keyword should be a class identifier, but that only the predefined actor and location are allowed.

Concerning colour scheme I prefer light ones, but since we want the section to be visually different from the white "paper" background that limits the choice. I've often used Solarized Light, but tweaked the background to a more grayish colour.

I don't think we want a proponent colour, like blue-ish as in Lakeside Light, I'm leaning more towards Github, Foundation or Magula. I'd like the code blocks to be identifiable, but not intrusive (which I think dark themes mostly are), and the syntax colouring to be clearly visible but not like a christmas tree.

(I went for Highlight.js showcase since I felt the update on Base16 was sluggish, particularly for some schemes. Maybe a performance issue with the highlighter?)

tajmone commented 6 years ago

Predefined Classes

Actually we should probably change the "syntax" from Current Actor to Current actor, semantically stating that following the Current keyword should be a class identifier, but that only the predefined actor and location are allowed.

Indeed, this would be less confusing. There is mentioning of pseudo-attributes in the Manual, but for practical purposes these technical distinctions aren't rally important for the end user, and we should stick to the natural English side of Alan, instead of the "technically correct" programming semantics.

About the classes, yes, I think of them not as keywords, but as just predefined classes. So in my mind they should have the same treatment as other, author defined, classes (and identifiers).

This creates some confusion right now because I've added to the highlighter's kewyords list all the keywords found in the Manual, and these include actor and location but not thing and object.

So, in this cases some predefined classes will be highlighter as keywords (shown in all caps) while others not (lowercased):

Every man IsA ACTOR
Every room IsA LOCATION
Every magic IsA thing
Every toy IsA object

The XSLHL highlighter doesn't use a stack, it has a flat approach to token, so it's not possible to highlight selectively according to context. This would be possible with Rouge, which we can use only for HTML, and we would end up with different syntax highlighting in PDF and HTML, which is not good.

So, we either remove actor and location from the keywords list or we add also thing and object. With the former solution actor and location would never be highlighted as keywords, even when they are pseudo attributes.

Current actor     --> 'actor' not a keyword!
Current location  --> 'location' not a keyword!

But then this would allow us to add all the predefined classes to a separate group of syntax elements, and we could style them differently if we wanted (for eg. in bold, just to remind reader that they are native classes).

The latter solution doesn't make much sense because classes shouldn't be colored like keywords in my opinion.

Literals: String?

I am still confused about the literals though. Should String be treated as a class or a keyword in highlighting?


Color Scheme

Concerning colour scheme I prefer light ones, but since we want the section to be visually different from the white "paper" background that limits the choice.

Since I've added a thin border around the code, even very light bg colors should work fine, and a slightly darker border will frame the code and separate it visually from the page

Transparent BG?

... besides, there is no golden rule about the code needing a background color or border, since it uses monospace fonts, and custom text coloring, we could actually do without any bg color and border at all, and probably it might look better, especially when there is a page break in the middle of a code block, which in the PDF slices the content abrupbtly (although in DocBook this can be controled and fixed).

Should we try transparent background, and just focus on foreground colors? After all, we only need a few contrasting colors here:

All the schemes you mentioned (Github, Foundation, Magula) could be tested with transparent (i.e. white) background. Some schemes were designed to work good on white out of the box (e.g. Google).

I'll do some tests and post some grabbed screenshots here.

Highlight Base16 Problems

... I felt the update on Base16 was sluggish, particularly for some schemes. Maybe a performance issue with the highlighter?)

Could you please expand on this point? I'm the contributor of the Base16 update in Highlight, so if there are problems with it I'd like to fix it.

thoni56 commented 6 years ago

Predefined classes

My suggestion is to remove actor and location from the keyword list. As a direct consequence of this decision, string should also be removed. They are all just classes, although special, predefined, ones.

They could be highlighted differently but if I read you correctly you think that is not a good idea, and I tend to agree.

Transparent background

Let's try transparent background, then.

Base16

By sluggish, I mean that when I changed the scheme it first rendered everything green (or something) and then after a second or two it drew the correct colouring. This made it impossible to skip through the schemes with any speed.

I tried it again just now, and the problem is gone. Browser issue maybe...

tajmone commented 6 years ago

Other Blocks Coloring

In the choice of the color scheme for Alan code we should keep in mind that there are other block which require colored background, and each of them should have a different color to differentiate them and avoid confusion.

These are all the colored (aka "shaded" in XSL terminology) blocks:

This being the context of colored blocks, I think that having Alan code without border and bg color could be actually a good idea since it would make it stand out by the fact that it doesn't need bg color nor border (i.e., it would make it "special" in this respect).

Keep in mind that Alan code blocks are also padded, and this together with the monospaced fonts and custom colors should make it clear that it's code. The only problem might be visually tracking indentantion of the code, but probably it's not really an issue in practicality.

thoni56 commented 6 years ago

Sounds ok. Until I see it in real life ;-)

tajmone commented 6 years ago

... when I changed the scheme it first rendered everything green (or something) and then after a second or two it drew the correct colouring. This made it impossible to skip through the schemes with any speed.

I tried it again just now, and the problem is gone. Browser issue maybe...

Problably cache issue then. While I was creating and testing the Base16 schemes in Hihglight GUI I didn't experience any problems, and neither after the schemes were included in the next Highlight release (which has created a separate list for the Base16 schemes, and placed them in a subfolder).

If you tested via Highlight CLI, the problem might have been due to the subfoldering of Base16 schemes. Who knows ...

tajmone commented 6 years ago

Next Steps

Probably what I should now is to also create the color schemes for the other blocks, so we can compare Alan code with the general contex:

I'll have a go at them this afternoon then!

tajmone commented 6 years ago

I've created a commit in a test branch so I could share a preview of the PDF documents using no bg color and border for Alan code:

The above links will always point to the latest PDF produced on the test branch, so even if we start tweaking colors they will always work.

I haven't actually followed any of the schemes you pointed out, because I've noticed they relied on some bg color, so I just worked out a quick tempoary palette that would fit a white bg. Current colors are temporary, and they could be discussed and improved.

What I wanted to test here is if presenting code without border nor background color looks nice or not.

I think it looks good, but I have mixed feelings about it (keep in mind that I'm a strong supporter of always using dark schemes for code, because they are less stressful to the eye).

Definitely, when a code block gets interrupted by a page-break it looks better without border or background.

Also, I have the impression that without a box around it these examples seem to flow in with the discourse a bit more (ie, the box creates a big contrast with the body text, without box the code seems more attached to the text, so to speak).

In any case, I doubt the the lack of a boxing frame makes it difficoult to distinguish between code and text — monospace font, syntax highlighting and differnt font sizes make it clear which is which.

What's your opinion? keep going down the no-border no-bg road, or revert to using a color scheme and just find the right one?

thoni56 commented 6 years ago

I'm much for the code to blend in with the text flow, like you say, but should still easy be identifiable as code. This, I think, leads us to a transparent, or very light background.

I agree that a border is probably a bad idea, since I presume it will generate two boxes if broken by page break.

An ever so light grey might work, maybe.

The current colours and styling definitely does work, possibly the result of a very strong red keyword colour. I think I'd prefer a slightly less strong one. I don't want the keywords to be eye-magnets, ideally you should be able to read the code as easily as the text around it, but still clearly identify itself as code.

Yes, I realize that having no parts of code "stand in front" is one way to describe it.

Also you should be able to squint with your eyes and then only see the text in quotes ;-)

tajmone commented 6 years ago

Ok, I've tried both Foundation and GtiHub schemes, as found on Highlight.js website. I avoided Magula because it looked too dark.

Bare in mind that with the new variables-based system changing schemes in the XSL stylesheet is very easy, and I actually just commented out the previous schemes, so restoring them is matter of a few clicks. Once we'll settle for a specific scheme I'll just delete the older ones from the source.

The PDF links above are now updated to the GitHub scheme — I tried Foundation but it looked a bit too darkish, anyhow below are the screenshot of both.

Here's a screenshot of Foundation:

Foundation Scheme

And here's a screenshot of GitHub:

Foundation Scheme

Also not that in GitHub I've removed the bold style from keywords, and I think it looks nicer (and would keep it that way for any other scheme too).

I prefer the GitHub scheme — by the way, I think that this is actually the old color scheme used by GitHub, the newer one is a bit brighter in colors.

thoni56 commented 6 years ago

I agree that the Github theme is better, and I think that is quite good. In the full manual PDF I feel it flows very nicely.