Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io
Other
307 stars 85 forks source link

Deprecated HTML element used #23

Closed dsifford closed 7 years ago

dsifford commented 7 years ago

Hi Frank,

I'm just now realizing that citeproc is still using the <i> element to indicate italic, or emphasized, text. This has been partially deprecated in HTML5. It's recommended to now use the <em> element for that. Technically speaking, it should really be the <cite> element, but because the purpose of the element is to apply italicized formatting for the wrapped text, I think <em> would be the best bet moving forward.

Here's what MDN says...

In earlier versions of the HTML specification, the <i> tag was merely a presentational element used to display text in italics, much like the <b> tag was used to display text in bold letters. This is no longer true, as these tags now define semantics rather than typographic appearance. The <i> tag should represent a range of text with a different semantic meaning whose typical typographic representation is italicized. This means a browser will typically still display its contents in italic type, but is, by definition, no longer required to.

Use this element only when there is not a more appropriate semantic element. For example:

  • Use <em> to indicate emphasis or stress.
  • Use <strong> to indicate importance.
  • Use <mark> to indicate relevance.
  • Use <cite> to mark the name of a work, such as a book, play, or song.
  • Use <dfn> to mark the defining instance of a term.
KarlHegbloom commented 7 years ago

Then since citeproc and CSL are for /defining/ the final presentation form, the html it outputs ought to be specific about the font style. It actually should not use , nor , but <span class="semantic info here" style="font-modifier: specific not affecting point-size;">. That semantic info carried by the class could help javascript programs that can use it for various purposes. It can then parse the case name, authors names, or differentiate between the page number the thing begins on from the page the citation is referencing within the article, independently from the actual CSL style in use.

On Thu, Nov 17, 2016, 08:38 Derek Sifford notifications@github.com wrote:

Hi Frank,

I'm just now realizing that citeproc is still using the element to indicate italic, or emphasized, text. This has been partially deprecated in HTML5. It's recommended to now use the element for that. Technically speaking, it should really be the element, but because the purpose of the element is to apply italicized formatting for the wrapped text, I think would be the best bet moving forward.

Here's what MDN says...

In earlier versions of the HTML specification, the tag was merely a presentational element used to display text in italics, much like the tag was used to display text in bold letters. This is no longer true, as these tags now define semantics rather than typographic appearance. The tag should represent a range of text with a different semantic meaning whose typical typographic representation is italicized. This means a browser will typically still display its contents in italic type, but is, by definition, no longer required to.

Use this element only when there is not a more appropriate semantic element. For example:

  • Use to indicate emphasis or stress.
  • Use to indicate importance.
  • Use to indicate relevance.
  • Use to mark the name of a work, such as a book, play, or song.
  • Use to mark the defining instance of a term.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/23, or mute the thread https://github.com/notifications/unsubscribe-auth/AACsltmkyJRV_C15OrsXlJp5v-dRrYJzks5q_HTxgaJpZM4K1c86 .

dsifford commented 7 years ago

@KarlHegbloom Agree with your thoughts! Though, I'd imagine a change of that size would have to bubble back to the CSL people to decide on since it would be breaking.

Edit: Misread the second part of your suggestion where you also suggest in-lining the style attribute.

My suggestion is purely one that can act as a temporary drop-in replacement for <i> for the time being.

For the record: @KarlHegbloom's suggestion is best-case scenario. Expanding on his ideas: perhaps we could even going as far as embedding rich microdata as defined by schema.org.

KarlHegbloom commented 7 years ago

In the add citation dialog's prefix and suffix box, and in abbreves, you can use simple html, like and . It ought to also have for small caps... but those must be translated to the correct output form, for html as well as RTF or the LaTeX like output format I've defined for the zotero-texmacs-integration I'm developing.

Hmmm... I wonder if a "semantic" rather than "presentational" format markup there would sometimes make sense, for when the thing might be used for more than one CSL style, like for the result of abbreves... is actually one of them I use. I think that markup ought to be stripped out of the strings for sorting☆, but left in for translation to the output format. In TeXmacs and in LaTeX, typesetting American English, the amount of space following a "." is different for one after an initial in an abbreviation than for one at the end of a sentence. So a citation with an abbreviation that ends with a "." doesn't look right unless it's wrapped in \abbr{...}.

☆ Because I think the presence of the in the results from the abbreves stage could affect the sort-ordering, I think it should be stripped out for sorting, but left in and translated for output. In case it does affect sort-ordering, I have a regex transformer on the client side that turns, e.g., "Sup. Ct.Sup. Ct.X-X-X" into "", so "Sup. Ct.Sup.Ct.X-X-XSup. Ct." will hopefully preserve the normal sorting order, while sending the abbr wrapped thing to typeset.

I also have a regex transformer that strips prefixes like, e.g., "08#@" or "04UC.78B.07.115#@" (starts with a numeral, ends with "#@", and may contain numerals, letters, periods, or dashes. This makes it possible, e.g., for me to force "01#@Sup. Ct." to sort ahead of "02App. Ct.", which I needed for a categorized bibliography, which uses the jurisdiction as part of the sorting-plus-grouping string.

Another regex transformation picks out pseudo entries added to the document's bibliography via the editBibliography integration dialog, transforming them from being a bibliographic item into being a subheading for a category in the bibliography. In the reference manager---I use Juris-M for it's legal writing support---the title of those items is like "000000000@#\ztbibSubHeading{!Text of Subheading}". The exclamation point preserves the "{" from being escaped, which it otherwise would be so you can use "{" in normal text without TeXmacs or LaTeX reading it as a special character. The "000000000@#" makes that title sort to the top, and the title is the innermost sorting string in the categorizing sort macro in the CSL style I use. So for a bibliographic subheading entry, all of the other elements of the category must be the same as for the items that should appear under it: type, jurisdiction, etc. Sometimes the author may need to be set to "000000" to make it sort to the top of the category.

https://github.com/KarlHegbloom/zotero-texmacs-integration

On Thu, Nov 17, 2016, 10:53 Karl Hegbloom karl.hegbloom@gmail.com wrote:

Then since citeproc and CSL are for /defining/ the final presentation form, the html it outputs ought to be specific about the font style. It actually should not use , nor , but . That semantic info carried by the class could help javascript programs that can use it for various purposes. It can then parse the case name, authors names, or differentiate between the page number the thing begins on from the page the citation is referencing within the article, independently from the actual CSL style in use.

On Thu, Nov 17, 2016, 08:38 Derek Sifford notifications@github.com wrote:

Hi Frank,

I'm just now realizing that citeproc is still using the element to indicate italic, or emphasized, text. This has been partially deprecated in HTML5. It's recommended to now use the element for that. Technically speaking, it should really be the element, but because the purpose of the element is to apply italicized formatting for the wrapped text, I think would be the best bet moving forward.

Here's what MDN says...

In earlier versions of the HTML specification, the tag was merely a presentational element used to display text in italics, much like the tag was used to display text in bold letters. This is no longer true, as these tags now define semantics rather than typographic appearance. The tag should represent a range of text with a different semantic meaning whose typical typographic representation is italicized. This means a browser will typically still display its contents in italic type, but is, by definition, no longer required to.

Use this element only when there is not a more appropriate semantic element. For example:

  • Use to indicate emphasis or stress.
  • Use to indicate importance.
  • Use to indicate relevance.
  • Use to mark the name of a work, such as a book, play, or song.
  • Use to mark the defining instance of a term.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/23, or mute the thread https://github.com/notifications/unsubscribe-auth/AACsltmkyJRV_C15OrsXlJp5v-dRrYJzks5q_HTxgaJpZM4K1c86 .

KarlHegbloom commented 7 years ago

Yes to microdata.

On Thu, Nov 17, 2016, 11:07 Derek Sifford notifications@github.com wrote:

@KarlHegbloom https://github.com/KarlHegbloom Agree with your thoughts! Though, I'd imagine a change of that size would have to bubble back to the CSL people to decide on since it would be breaking.

My suggestion is purely one that can act as a temporary drop-in replacement for for the time being.

For the record: @KarlHegbloom https://github.com/KarlHegbloom's suggestion is best-case scenario. Expanding on his ideas: perhaps we could even going as far as embedding rich microdata as defined by schema.org https://schema.org/docs/gs.html.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/23#issuecomment-261322331, or mute the thread https://github.com/notifications/unsubscribe-auth/AACslsGMabKJe3fhI9UL7jW4XD8_HJyEks5q_JfBgaJpZM4K1c86 .

KarlHegbloom commented 7 years ago

I'm saying that the markup returned by citeproc should not change the font size. That must be left for the citeproc client side to determine. For example, when the citation appears in text it may be a different font size than when it appears inside a footnote.

I'm fairly sure that the numbers given to the client of the word processor integration for the Document_setBibliographyStyle (iirc) assume a 12pt font. But the client side can scale those according to the actual font in use. See my TeXmacs extension code to see how I've handled it. It works well enough for me, but if anyone finds a problem with it in the context of their own documents, please freely utilize the github issues system to report it to me so we can fix it.

On Thu, Nov 17, 2016, 11:07 Derek Sifford notifications@github.com wrote:

@KarlHegbloom https://github.com/KarlHegbloom Agree with your thoughts! Though, I'd imagine a change of that size would have to bubble back to the CSL people to decide on since it would be breaking.

My suggestion is purely one that can act as a temporary drop-in replacement for for the time being.

For the record: @KarlHegbloom https://github.com/KarlHegbloom's suggestion is best-case scenario. Expanding on his ideas: perhaps we could even going as far as embedding rich microdata as defined by schema.org https://schema.org/docs/gs.html.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/23#issuecomment-261322331, or mute the thread https://github.com/notifications/unsubscribe-auth/AACslsGMabKJe3fhI9UL7jW4XD8_HJyEks5q_JfBgaJpZM4K1c86 .

dstillman commented 7 years ago

It actually should not use <em>, nor <i>, but <span class="semantic info here" style="font-modifier: specific not affecting point-size;">.

Yeah, just to expand on this, between em and i, i is arguably more appropriate here. More from MDN:

The <em> tag represents stress emphasis of its contents, while the <i> tag represents text that is set off from the normal prose, such as the name of a movie or book, a foreign word, or when the text refers to the definition of a word instead of representing its semantic meaning.

In this case, em would be implying semantic meaning that doesn't apply (since this isn't actually emphasized text). It's the presentation — italics — that actually matters.

Personally I'm not sure there's that much value in semantic markup of the individual components of bibliographic content, for reasons I get into in this thread, but embedding full metadata would make sense. citeproc-js may have less data than the tool that's calling it, though — in our case, we'd add full metadata to the bibliography from Zotero rather than relying on the lossy CSL-JSON conversion.

KarlHegbloom commented 7 years ago

I've got a monkey patch in my propachi-texmacs to support hyperlinking citations to their bibliography entries, and for making the bibliography entries link to their doi or url, plus the same tags make it possible for it to gather the page references to each page a citation appears on to display that list after each bibliography item in the document. The normal citeproc output format can not at this time access anything defined by zotero without calling out to a routine like the variablewrapper.

Maybe there should be a standardized interface for having that meta-data information available inside citeproc?

Hmmm... when it calls the variable wrapper, it passes a state object that knows...

Well, I'm not going to write this all out here on this on screen keyboard.

On Thu, Nov 17, 2016, 14:19 Dan Stillman notifications@github.com wrote:

It actually should not use , nor , but .

Yeah, just to expand on this, between em and i, i is arguably more appropriate here. More from MDN https://developer.mozilla.org/en-US/docs/Web/HTML/Element/em#%3Ci%3E_vs._%3Cem%3E :

The tag represents stress emphasis of its contents, while the tag represents text that is set off from the normal prose, such as the name of a movie or book, a foreign word, or when the text refers to the definition of a word instead of representing its semantic meaning.

In this case, em would be implying semantic meaning that doesn't apply. It's the presentation — italics — that actually matters.

Personally I'm not sure there's that much value in semantic markup of the individual components of bibliographic content, for reasons I get into in this thread https://forums.zotero.org/discussion/63133/creating-personal-web-page-for-recognition-by-generic-translators, but embedding full metadata would make sense. citeproc-js may have less data than the tool that's calling it, though — in our case, we'd add full metadata to the bibliography from Zotero rather than relying on the lossy CSL-JSON conversion.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/23#issuecomment-261372583, or mute the thread https://github.com/notifications/unsubscribe-auth/AACsliJpjqhOkN2uqwZWetTMLVU4kX0eks5q_MTcgaJpZM4K1c86 .

fbennett commented 7 years ago

The discussion here was useful. Closing because at the end of the day the HTML-ish markup interpreted by citeproc-js will remain as is, leaving any conversions to the output formatter.