essepuntato / rash

Research Articles in Simplified HTML (RASH) Framework includes a markup language defined as a subset of HTML+RDF for writing scientific articles, and related tools to convert it into different formats, to extract data from it, etc.
https://w3id.org/people/essepuntato/papers/rash-peerj2016.html
ISC License
82 stars 24 forks source link

Clarification about RASH in HTML5 and XHTML5 #20

Closed essepuntato closed 9 years ago

essepuntato commented 9 years ago

In the documentation, a clear explanation of how to use RASH either in pure HTML5 or in XHTML5 should be added.

npdoty commented 9 years ago

I was surprised to find hard requirements to use non-semantic elements with required classnames rather than the HTML5 equivalent semantic element. For example, as I read the documentation, I can't use <section> as defined in HTML5, I have to use <div class="section">? HTML5 includes <figure> and <figcaption>. I think RASH should note that HTML5 authors can use these semantic tags instead, and just define them as having the same behavior as a document that uses div and span with the special classnames.

npdoty commented 9 years ago

Also, I'd like to be able to use <code> rather than, or in addition to, <span class="code">.

essepuntato commented 9 years ago

Hi Nick,

I see your points of using the "semantic" tags (e.g., section) instead of their generic structure provided by the documentation (e.g., div with class "section"), as well as the fact that we suggest to use the class "code" instead of the element.

Basically, it was made on purpose as a design choice. In RASH we wanted to keep everything much easier in order to decrease the cognitive effort of a user that write RASH documents without using any WYSIWYG editor. The idea was that having less elements to remember would simplify the learning and writing of RASH.

The "code" case is particularly relevant, because here we wanted a way for defining a particular behaviour in a similar way even in different contexts (i.e., in inline elements and in block elements). Of course, if we use the full HTML(5) approach, I should use different tags for defining codes. In particular:

Inline code definition: <p>This text contains a <i><code>call to a function in italics</code></i> as an inline element.</p>

Block code definition: <pre><code>This is a full block of code</code></pre>

As you can see, to have both situations I should

Thus, in order to keep the same "semantics" without adding additional elements to the language and trying to keep everything simpler and similar in both cases (inline and block), we thought that the use of the same “class” upon existing RASH elements could be easier to use and remember. For instance, the RASH translation of the aforementioned HTML code is:

<p>This text contains a <i class="code">call to a function in italics</i> as an inline element.</p> <p class="code">This is a full block of code</p>

For some of the other tags you proposed, e.g., figure and figcaption, we used a similar rationale. Since usually all the "floating" structures of academic articles (e.g., tables, figures, formulas) share similar structural elements, we decided to have a generic approach using always divs instead of the related semantic elements. Of course for the figures one may use the figure and figure caption elements. However this would request an additional effort for the RASH user - i.e., more elements to remind for creating basically similar structures. In fact, in that case, one should learn that for modelling figure boxes there are appropriate aforementioned elements, while for modelling other structures (e.g., figure boxes, table boxes) one have to use the generic approach (or even other elements, if they exist).

Another point is to allow an easy approach for future extension. Suppose that in the future (and we will do, indeed) we want to introduce the "listing box", i.e., a box containing a piece of code accompanied by a caption. Handling it using the generic approach is easy, e.g.,

<div class="listing"> <p class="code">my source code</p> <p class="caption">my caption</p> </div>

Of course, I also think that a deep thinking about extending RASH with other more semantical elements, i.e., em and strong, and even section (if no other high-level similar structure will be added in the future, like chapters), would be something to do. But I still think that "keep the language simple while preserving the needed expressiveness" should be a good rule to follow in RASH development.

@npdoty, @sideshowbarker: I really would like to have your opinions on the above points.

npdoty commented 9 years ago

Thanks so much for the detailed explanation!

I think trying to maintain simplicity for authors is an awesome goal. I'm not sure that moving names into class attributes over tags does that much to accomplish that, though. I still have to remember "code", I just have to remember that it's a classname rather than the HTML tag and which elements it's allowed to modify. For those people who know HTML in particular, I think it's easier to just use existing semantic tags where HTML has the tags already. Also, this allows an author to write HTML that will look reasonably accurate even with a default stylesheet, or when using a stylesheet without RASH.

I use ReSpec (a similar project, in which authors write in simplified HTML and embedded JavaScript adds features or cleans things up) for some writing. In that case, I can use <section> as an easy thing to write in my HTML and the ReSpec JavaScript can infer and adjust headings inside of each section, and create a table of contents. As an author, it seems simpler to just nest <section> tags rather than divs with special classnames (though ReSpec does also use some special classnames, as you are).

The other thing to consider is pre-processors and the like. I tend to do my academic writing in Markdown and have it converted to HTML with Pandoc. Pandoc is pretty good about handling semantic HTML tags, but it would be much more burdensome to add in special class names. I would notice that around figure, for example. However, I can see that with all the possible variations around images and figures, this may be harder to rely on existing tags.

So to summarize, for me as an author, using as much of existing HTML as we can is more natural. Relying on HTML rather than special classnames can also improve interoperability, when I try to change my makefiles etc. to output RASH.

fvitali commented 9 years ago

Dear all:

My own 2 cents.

I am rather on the opposite point of view on this regard. If RASH ends up being "a somewhat simplified version of HTML according to the peculiar needs of a specific community of authors identified by the use of a specific set of tools", then its interest vanes for me.

My view of RASH is that it should have a principled way of reducing the tag set of HTML according to general and objective justifications that go beyond the simple "I use this tag a lot". Nick may use a lot, my friend Francesca uses a lot, we will find somewhere people that use a lot, each one of us has documents that use a specific subset of HTML a lot. That is no way to go about simplifying a language.

My understanding of the principles behind the simplification of a language is that there are two questions to answer for every choice: what is the STRUCTURAL NATURE, and what is the PURPOSE of each document fragment we are considering. HTML has a mixed history of confusion between these two questions, so that we have because technical manuals were frequent when HTML was invented, but we do not have because recipes were not.

I believe that RASH says (or should say): every different structure has its own tag, every different purpose has its own class. Thus, whether a fragment is a block, an inline, a table cell or a container is part of its structural nature, and deserves its own tag (

, , , or

). On the other hand, whether a fragment is a piece of code, a citation, a keyboard sample or the ingredient of a recipe is a purpose with no intrinsic structural characterization, and therefore needs no tag.

From this point of view, I don't particularly care whether we end up with

or
for containers, as long as we only choose ONE of the them. Either of them, but one only. As for and , I believe we only need one (most probably, ). As for and , I am fine with just and , but I would be open, considering their specific frequency and success, to make an exception for and . I don't see a similar justification for and .

To summarize: I do NOT believe that "I use this tag a lot, let's keep it" should be an acceptable line of reasoning. Nature/purpose seems to me a better cutting line.

Ciao

Fabio

On 28/ago/2015, at 07:47, Nick Doty notifications@github.com wrote:

Thanks so much for the detailed explanation!

I think trying to maintain simplicity for authors is an awesome goal. I'm not sure that moving names into class attributes over tags does that much to accomplish that, though. I still have to remember "code", I just have to remember that it's a classname rather than the HTML tag and which elements it's allowed to modify. For those people who know HTML in particular, I think it's easier to just use existing semantic tags where HTML has the tags already. Also, this allows an author to write HTML that will look reasonably accurate even with a default stylesheet, or when using a stylesheet without RASH.

I use ReSpec (a similar project, in which authors write in simplified HTML and embedded JavaScript adds features or cleans things up) for some writing. In that case, I can use

as an easy thing to write in my HTML and the ReSpec JavaScript can infer and adjust headings inside of each section, and create a table of contents. As an author, it seems simpler to just nest
tags rather than divs with special classnames (though ReSpec does also use some special classnames, as you are).

The other thing to consider is pre-processors and the like. I tend to do my academic writing in Markdown and have it converted to HTML with Pandoc. Pandoc is pretty good about handling semantic HTML tags, but it would be much more burdensome to add in special class names. I would notice that around figure, for example. However, I can see that with all the possible variations around images and figures, this may be harder to rely on existing tags.

So to summarize, for me as an author, using as much of existing HTML as we can is more natural. Relying on HTML rather than special classnames can also improve interoperability, when I try to change my makefiles etc. to output RASH.

� Reply to this email directly or view it on GitHub.

Fabio Vitali The sage and the fool Dept. of Informatics go to their graves Univ. of Bologna ITALY alike in this respect: phone: +39 051 2094872 both believe the sage to be a fool. e-mail: fabio@cs.unibo.it Where, then, may wisdom be found? http://vitali.web.cs.unibo.it/ Qi, "Neither Yes nor No", The codeless code