TEIC / Stylesheets

TEI XSL Stylesheets
231 stars 124 forks source link

update html output declarations to modern XHTML5 #487 #499

Closed HelenaSabel closed 3 years ago

HelenaSabel commented 3 years ago

My first attempt to address issue #487 (and my first contribution to the Stylesheets!)

In html/html_param.xsl I added the new parameters to be consistent with how the stylesheets were calling the previous parameters related to the the old doctype declaration.

Besides updating the instances of <xsl:output> and <xsl:result-document> that created HTML documents, I modified this line because otherwise the UTF-8 encoding was being declared twice.

sydb commented 3 years ago

Woo-hoo! Two minor concerns, one big problem:

  1. (Minor) Should we be setting @omit-xml-declaration to "yes"? Seems to me we would, in general, want them. But I am not much of an HTML and browser person, so there may be gotchas I don’t know about; and perhaps it is another ticket, eh?
  2. (Minor) Not that it matters, but is there anything wrong with declaring the character set encoding twice?
  3. (Major) As the output HTML files are changed, this breaks not only the tests in Test/ (kind of expected, they are very fragile) but also those in Test2/ (not surprising, because although they are not nearly as fragile, these tests are still comparing serializations, not trees).
hcayless commented 3 years ago

For modern HTML, I would say omitting the XML Declaration is correct. The character encoding thing is sloppy, but probably doesn't break anything. But we should fix it, because it's sloppy. As Syd says the PR should be updated to fix the expected results.

HelenaSabel commented 3 years ago
1. (Minor) Should we be setting `@omit-xml-declaration` to `"yes"`? Seems to me we would, in general, want them. But I am not much of an HTML and browser person, so there may be gotchas I don’t know about; and perhaps it is another ticket, eh?

To my understanding, the XML declaration is not needed unless the encoding is not UTF-8. I am partial to omitting it, but I need to find more solid arguments than the first one that came to mind (because it was related to having the XML declaration in files with a .php extension, which is not the point of this). It seems more practical to address the presence or absence of the declaration in this ticket, but if it ends up needing a long discussion it might be better to divide it in two tickets?

2. (Minor) Not that it matters, but is there anything wrong with declaring the character set encoding twice?

So the issue was that that <xsl:when> was creating a second <meta> element with just the encoding attribute: any browser would render it without problems but I think the repetition is invalid.

3. (**Major**) As the output HTML files are changed, this breaks not only the tests in Test/ (kind of expected, they are very fragile) but also those in Test2/ (not surprising, because although they are not nearly as fragile, these tests are still comparing serializations, not trees).

For my personal testing, I manually encoded the expected changes (like adding the HTML doctype declaration) but I didn’t know how to delve into a more sustainable testing process...

martindholmes commented 3 years ago

I'm pretty sure the HTML validator will complain if the character encoding is declared twice. I think Saxon 10 provides one character encoding declaration, so if we're using a recent Saxon, we shouldn't have to add anything extra.

HelenaSabel commented 3 years ago

I finally finished with the testing so this PR is almost ready to be reviewed. I say almost because I had some doubts about the merging of the HMTL and HTML5 routines, therefore some work might still be needed.

At this point, there are no HTML5-specific stylesheets. Any reference to html5.xsl was replaced with html.xsl and of course html.xsl and the html folder were updated accordingly. What I haven’t done, though, was to delete the html5 folders. For instance, the html5 folders exist in different profiles, but the to.xsl file they contain points to html.xsl instead of html5.xsl (see example). I am guessing that the most elegant solution would have been to delete the HTML5 references and have the html5 commands to just be aliases of the html ones, but I am not sure I can do that on my own.

Regarding the merge in terms of the XSLT transformation itself, I had a doubt about the parameter divOffset. This parameter being equal to 1 was present in html5.xsl and thus I added it to html.xsl (ec6205b). One of the changes you will see now when comparing the new expected results is that all the headings will be consistently one hierarchical step higher. I just wanted to make sure that that was the wanted behavior.

Since there indentation is quite different, the expected results are a bit hard to compare with the previous version. These are the main changes you will find:

And I think that covers it!

martindholmes commented 3 years ago

@HelenaSabel This is great work. It'll be an easy thing to tweak the symlinks in the /bin folder to point the html5 one to the html one, which will then take over, I think.

The only thing I'm not sure about is dropping the xml:lang attribute. Since we don't know whether end-users will choose to serve their documents as XHTML or not, I see no problem in leaving the xml:lang attribute alongside the lang attribute, as the W3C document says; that's what I tend to do in my projects.

JanelleJenstad commented 3 years ago

I don’t have an opinion on xml:lang but wanted to second Martin’s accolades. Thanks for all this work!

Janelle Jenstad, Associate Professor, Department of English, University of Victoriahttps://journals.uvic.ca/index.php/scene/ Director, Map of Early Modern Londonhttps://mapoflondon.uvic.ca/; Director, Linked Early Modern Drama Onlinehttps://lemdo.uvic.ca/ Co-Coordinating Editor, Digital Renaissance Editionshttps://digitalrenaissance.uvic.ca/ and the new Internet Shakespeare Editions Co-Applicant, The Endings Projecthttps://endings.uvic.ca/index.html Email: @.**@.>, @.**@.>, or @.**@.>

From: Martin Holmes @.> Sent: August 3, 2021 1:44 PM To: TEIC/Stylesheets @.> Cc: Subscribed @.***> Subject: Re: [TEIC/Stylesheets] update html output declarations to modern XHTML5 #487 (#499)

Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.

@HelenaSabelhttps://github.com/HelenaSabel This is great work. It'll be an easy thing to tweak the symlinks in the /bin folder to point the html5 one to the html one, which will then take over, I think.

The only thing I'm not sure about is dropping the xml:lang attribute. Since we don't know whether end-users will choose to serve their documents as XHTML or not, I see no problem in leaving the xml:lang attribute alongside the lang attribute, as the W3C document says; that's what I tend to do in my projects.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/TEIC/Stylesheets/pull/499#issuecomment-892152820, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABNLCOCB2K5JWND3FP5RQVLT3BIJRANCNFSM42OKJT2Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

HelenaSabel commented 3 years ago

@HelenaSabel This is great work. It'll be an easy thing to tweak the symlinks in the /bin folder to point the html5 one to the html one, which will then take over, I think.

Thank you @martindholmes! I thought it could be more complicated than that. I could tackle the XSLT dependencies, but I was afraid of some other type of dependencies related to the other programing languages that work here.

The only thing I'm not sure about is dropping the xml:lang attribute. Since we don't know whether end-users will choose to serve their documents as XHTML or not, I see no problem in leaving the xml:lang attribute alongside the lang attribute, as the W3C document says; that's what I tend to do in my projects.

Thank you for this comment, because I was doubtful myself (in fact, I was adding both at some point a48b53e). I changed my mind at re-reading the recommendation because the presence of both was only mentioned when serving XHTML 1. But you are absolutely right, if the pages are served as XML, @lang does not hurt but @xml:lang would be crucial.

I will fix these details tomorrow (first thing in the morning) and put the PR ready for review.

HelenaSabel commented 3 years ago

I did the symlink in the bin folder. And the ll command shows that the symlink has the proper target, but the stylesheets in the html5 folders inside profiles are still being fired. I see three possible approaches:

  1. Keep the XSLT files as they are in folders like profiles/default/html5 or profiles/tei/html5. These stylesheets are calling for the html.xsl one (because html5.xsl no longer exists) so there is no inconsistency.
  2. Create a symlink in all these folders. For instance, profiles/default/html5/to.xsl would be a symlink to profiles/default/html/to.xsl
  3. Update transformtei to add some kind of condition for the html5 command.

Option 1 is already implemented and I could do option 2. Sadly, I cannot do option 3.

martindholmes commented 3 years ago

@HelenaSabel I'm surprised that the symlink from teitohtml5 to teitohtml fails, but I think a simpler approach might be just to make the profiles/tei/html5 folder a symlink to profiles/tei/html, so the old html5 files no longer exist. Then we could deprecate the teitohtml5 symlink and the html5 folder, and in a year or two, just remove them. What do others think?

HelenaSabel commented 3 years ago

Converting to draft to work on the symlinks

sydb commented 3 years ago

Because we are in refrigeration most, if not all, of us would say this PR is technically ineligible to be merged in for the upcoming release. (It is a code change, not a mere wording change.) For that reason, I am not even going to recommend it be merged now, let alone do so. But if the release techs (other than HSB) agree to it (i.e., if they do not mind taking on the risk), I think they could reasonably merge it before forking the release.

HelenaSabel commented 3 years ago

I think it might be better to wait. I will add your suggestions to the PR, @sydb, and I also need to run Test2 (even if I don't find any errors, it is likely that I will need to update the expected results of that test)

sydb commented 3 years ago

Fine with me to wait. And yes, there are Test2/ expected result errors.

sydb commented 3 years ago

I ran the same tests I ran previously, and also successfully built the Guidelines against this set of stylesheets.