PreTeXtBook / pretext

PreTeXt: an authoring and publishing system for scholarly documents
https://pretextbook.org
Other
266 stars 208 forks source link

Add an <s> tag for sentences #498

Closed j-loreaux closed 2 years ago

j-loreaux commented 7 years ago

Currently, extraneous whitespace within a paragraph <p> tag is vehemently discouraged because it can cause unwanted behavior for certain output formats (e.g., unintended line or paragraph breaks, perhaps other errors). This means putting an entire paragraph on one line of the source file. Since version control software is line-based, this means it is impossible to commit a change to a single sentence without it looking like a change to an entire paragraph. Of course, the details can be seen with features like --word-diff in Git, but it does make it impossible to stage changes to different sentences within the same paragraph separately.

Therefore, I propose an <s> tag for sentences within <p>. This would make the atomic unit of prose within MathBook XML the sentence rather than the paragraph, thereby mimicking human language (any sentence can be added or removed without affecting the syntactic validity of language). This moves the "whitespace within <p>" issue to "whitespace within <s>", which is preferable both because it allows for staging changes separately and also because this is a narrowing of location of the problem.

In the Google group discussion on the subject, it was also mentioned that the creation of <s> tags may help with issues of period handling in LaTeX.

davidfarmer commented 7 years ago

It is perfectly fine to have an tag for sentences, and a tag for words. But those do not really solve the problem. For example, am I allowed to start a new line after a comma?

What is needed is a very slightly more flexible workflow from the human-written source to the final output.

I am partway through a writeup of my ideas, which I will post to the newsgroup.

On Wed, 11 Jan 2017, j-loreaux wrote:

Currently, extraneous whitespace within a paragraph

tag is vehemently discouraged because it can cause unwanted behavior for certain output formats (e.g., unintended line or paragraph breaks, perhaps other errors). This means putting an entire paragraph on one line of the source file. Since version control software is line-based, this means it is impossible to commit a change to a single sentence without it looking like a change to an entire paragraph. Of course, the details can be seen with features like --word-diff in Git, but it does make it impossible to stage changes to different sentences within the same paragraph separately.

Therefore, I propose an tag for sentences within

. This would make the atomic unit of prose within MathBook XML the sentence rather than the paragraph, thereby mimicking human language (any sentence can be added or removed without affecting the syntactic validity of language). This moves the "whitespace within

" issue to "whitespace within ", which is preferable both because it allows for staging changes separately and also because this is a narrowing of location of the problem.

In the Google group discussion on the subject, it was also mentioned that the creation of tags may help with issues of period handling in LaTeX.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AAM6LAQeda8JHYGHkYpVFpAfi0M8JkuFks5rRUFpgaJpZM4LhDmF.gif]

rbeezer commented 7 years ago

@j-loreaux Please see recent posts on the Google group. Not a complete solution, but an improvement. Could you see if processing with -stringparam whitespace flexible on the xsltproc command-line addresses some of your requests?

davidfarmer commented 7 years ago

What I thought was going to happen is that the allowed use of white space in a paragraph would take care of the particular use cases that motivated this issue?

I like the idea of having an s tag for sentences, but please explain the immediate problem it is solving.

davidfarmer commented 2 years ago

White space handled better now.