TEIC / Stylesheets

TEI XSL Stylesheets
231 stars 124 forks source link

surname cannot come first #325

Open Dominique-M opened 6 years ago

Dominique-M commented 6 years ago

In certain cultures the surname comes first and TEI allows surname forename in that order under persName. Though the HTML5 transformation (7.44 as a Debian package) seems to put the forename first anyway. I feel the choice should reside on the side of the author, not the stylesheets. Or do I miss something? See http://www.d-meeus.be/biblio/Tsuzuki1967.html and http://www.d-meeus.be/biblio/biblio.xml. (In the meanwhile, as a workaround, I ‘commented out’ surname forename and wrote the full name directly as content of persName.)

jamescummings commented 6 years ago

I think this is happening because of an arbitrary(?) decision to process surname first by default in this template:

https://github.com/TEIC/Stylesheets/blob/292f2829880b5d0345447191ee3d0d448fc8b745/common/common_core.xsl#L258-L285

Although you said HTML5 transformation, this just uses the underlying HTML transformation, which itself uses the transformations in common across multiple transformations.

Local Project Workaround: In your calling of the HTML5 transformation have your own XSLT which calls this but overrides the template match for tei:editor|tei:author to put forename first. (i.e. the exact same template but 'surname' changed to 'forename'. This might be a quick workaround for you depending on whether you feel comfortable doing that. The benefit is that you can return to using the markup rather than commenting it out and putting it as content of persName, which means it shows up with anything else you do with forenames and surnames.

Proposed Solution: If council agrees this is a good idea (not privileging surname in this way) then we should put in a parameter (into common_param.xsl) to either mark whether surname should come first or not (which I still think should be the default) or if we wanted to be really clever about it, pass the name of the element which should come first. The former is obviously much easier than the latter. If going for the former, maybe defaultSurnameFirst=true should be the default. If someone sets this to false then the child elements should just be processed in the document-provided order. This allows seamless backwards compatibility and total freedom in entries author to author about the order.

Dominique-M commented 6 years ago

I am an amateur writing mostly “born digital” TEI, but if TEI is used to encode an existing text, why on earth should the stylesheets decide on such a thing, possibly against the author (and the encoder following the author). TEI is used in all the world. But even in English it doesn’t make sense: it would make difficult to encode that Mao Zedong wrote about Norman Bethune (https://en.wikipedia.org/wiki/Norman_Bethune#Legacy). There should be no default on that matter, but I can understand the need to keep backwards compatibility. Even if it goes only about authors and editors, this should be free.

duncdrum commented 6 years ago

I agree with @Dominique-M here. Under no circumstances should the stylesheet change the order of fore-/surename from how encoders used them.

@jamescummings I like the proposed solution, but the default should be defaultSurnameFirst=false, this is based on many instances in bibliographies and even library catalogs that do not properly cite or sort by actual surname, and go by sequence alone. e.g. "Mao, Zedong" ,"Zedong Mao", … which are simple wrong.

lb42 commented 6 years ago

Phrase like "under no circumstances" are always a bit provocative. Imagine a situation where I am merging two or three different listBibls in which the order of encoding varies (for whatever reason). I might well want the stylesheet which renders them to impose a single coherent order, without having to go thro the tedious business of retagging the data. The order is an aspect of the formatting, not the encoding.

duncdrum commented 6 years ago

@lb42 well maybe my phrasing was a bit to harsh ( I blame the ⚽️ ), but the point stands. For use in citation styles, indexes, concordance, listBibls etc there is no single correct order for the stylesheets to impose. So the stylesheet should trust creators of the listBibls to have picked the right one in the first place. I have no objections to provide a convenience function for users to override this, but it shouldn't be on by default. Why should editors have to go thought the tedious process of fixing the stylesheet or their listBibls because it magically turns

<persName>
  <surname>Mao</surename>
  <forename>Zedong</forename>
</persName>
<persName> 
  <forename>Duncan</forename>
  <surname>Paterson</surename>
</persName>

into Zedong Mao & Duncan Paterson?

Dominique-M commented 6 years ago

The question is: what are the stylesheets for? A. I understand that the main use is to transform TEI encoding to some other XML dialect, like HTML or ODF text document. There is no reason, in that operation, to impose a particular bibliographic style against the choice of the encoder. B. Of  course, XSLT may be used to normalise encoding automagically instead of a tedious manual work, for example normalise all references into some particular bibliographic style. I this I agree with Lou. After all we encode to allow such treatments. That precisely the meaning of digital in Digital Humanities. But this is another treatment that converting the XML to another. The example given by Lou seems to me too specialised for the general stylesheets. I would call my A. and B. tentative. I may be wrong. I am an amateur TEI user, not a digital humanist. My real point is the first sentence only (The question is…).

Duncan Paterson a écrit le 27/06/18 à 15:06 :

@lb42 https://github.com/lb42 well maybe my phrasing was a bit to harsh ( I blame the ⚽️ ), but the point stands. For use in citation styles, indexes, concordance, listBibls etc there is no single correct order for the stylesheets to impose. So the stylesheet should trust creators of the listBibls to have picked the right one in the first place. I have no objections to provide a convenience function for users to override this, but it shouldn't be on by default. Why should editors have to go thought the tedious process of fixing the stylesheet or their listBibls because it magically turns

Mao Zedong Duncan Paterson

into |Zedong Mao & Duncan Paterson|?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TEIC/Stylesheets/issues/325#issuecomment-400664778, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZNWhTgHQkFm3_ncRTUt2BjCmk98Yllks5uA4NxgaJpZM4UXmI5.

tuurma commented 6 years ago

To reproduce run TEI P5 XHTML transformation scenario on

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
   </teiHeader>
   <text>
      <body>
         <p>
            <persName>
               <surname>surname</surname>
               <forename>forname</forename>
            </persName>
         </p>
      </body>
   </text>
</TEI>

Expected result would be to have surname forename, current result is reverted

image
martindholmes commented 5 years ago

Lines 1244 through 1265 of common_core.xsl have templates that cause this behaviour. Commenting out those templates solves the problem for @tuurma's example, but there are other places such as the lines @jamescummings points at which do similar things. The Stylesheets Coop thinks this behaviour should be changed; names should be output by default in the order they're encoded, including intervening text nodes (punctuation etc.). A parameter could be added to prevent these templates from firing unless it's explicitly set.

sydb commented 5 years ago

Stylesheets group, discussing this more, thinks that the order events should be:

  1. Remove all the special processing of <persName> and its children; thus ones output will reflect whatever is in the input, period.
  2. Post about this change here on the ticket and to TEI-L and in the Stylesheets change log.
  3. IF someone is upset by this change, THEN ask that person to post the complaint with their use case as a ticket, and at that time consider what, if any, other behaviors should be supported (likely via a parameter).
sydb commented 5 years ago

Assigning myself to do step (1), above.

sydb commented 5 years ago

I tried step 1 by

  1. Creating simple test input file.
  2. Generating output HTML5.
  3. Deleting the templates we (well, @martindholmes) found as the source of the problem during the Stylesheets group conference call. They have been replaced with a comment giving the commit number before the deletion for easy reversal.
  4. Generating output HTML5.

The test file and the two output HTML5 files are here: test_persName_for_325.tar.gz

HOWEVER, doing this causes Test/test23 to differ slightly in the HTML (which I think we won't care about), and even more in the TeX, which I am hoping someone else knows or will figure out if we care about or not (I can't really run TeX on this laptop, and may not be able to get to this until Jan 04) test23.tar.gz

AND much worse, it causes Test/test27 to essentially fail. The loss of whitespace in the middle of a name is really a problem. test27.tar.gz

Thoughts?

(P.S., I have not committed my changes, yet; but will put them in in a different branch if someone reminds me how to do that.)

martindholmes commented 5 years ago

If whitespace is lost during processing, then the output isn't in fact reflecting what's in the input, which is weird.