andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Fix title capitalization #71

Closed rudolf-adamkovic closed 2 years ago

rudolf-adamkovic commented 2 years ago

In APA Style, title BibTeX field should render in sentence case, not e.g. the original case, except a word like {THIS}.

Tested with Pandoc and CSL: https://github.com/citation-style-language/styles/blob/master/apa.csl

andras-simonyi commented 2 years ago

Could you post a MWE with a concrete Org document, accompanying bibliography entries and description of the desired and actual export output? Looking into the APA CSL style file I don't see any sign that it requires the title-field to be case-transformed in any way.

bdarcus commented 2 years ago

This data issue may be relevant?

https://github.com/citation-style-language/styles/issues/91

In APA Style, title BibTeX field should render in sentence case, not e.g. the original case, except a word like {THIS}.

Per the issue I link to above, the CSL styles, including APA, pretty much assume sentence case, because while it's easy to convert from sentence to title case, it's not possible the other way, unless you add the LaTeX braces convention (though there is an unofficial "nocase" tag for sub-field formatting that is the same thing, that we will formalize in time).

It would probably make sense, if not too much of a hassle, for citeproc to support the braces for bibtex/biblatex input.

andras-simonyi commented 2 years ago

I think braces are already supported for signaling spans that shouldn't be case-converted, at least in title fields. My impression is that the issue here is that Pandoc's bib(la)tex->CSL converter automatically converts title fields to sentence case but I'm not convinced that this behavior should be implemented/desirable in citeproc-el too.

denismaier commented 2 years ago

I think it should. Or why shouldn't it?

andras-simonyi commented 2 years ago

I guess the argument is what Bruce mentioned above:

it's easy to convert from sentence to title case, it's not possible the other way

of course, the braces could solve this but still, we make life miserable for those who want to use a title-casing style without bothering with the braces. The other solution is to do what CSL-JSON does IIUC and assume that the input is in sentence-case.

denismaier commented 2 years ago

The point is: bibtex standard is title case.

andras-simonyi commented 2 years ago

The point is: bibtex standard is title case.

Thanks, that is very useful. Is this the case for biblatex too?

bdarcus commented 2 years ago

This is an interesting thread on the topic.

https://tex.stackexchange.com/questions/439440/what-is-the-proper-casing-to-use-when-storing-titles-in-the-bibliography-databas

It concludes the answer is "yes", but has interesting context and speculation on differences.

denismaier commented 2 years ago

I think so, or at least that was the reason for pandoc's behaviour. Well, that's a circular argumentation 😉. I'll check.

denismaier commented 2 years ago

Ok. Some explanations here: https://retorque.re/zotero-better-bibtex/support/faq/#bbt-is-changing-the-capitalization-of-my-titles----why

rudolf-adamkovic commented 2 years ago

Could you post a MWE with a concrete Org document, accompanying bibliography entries and description of the desired and actual export output? Looking into the APA CSL style file I don't see any sign that it requires the title-field to be case-transformed in any way.

Here I created a working MWE with Pandoc that produces the expected output (with the same apa.csl).

test.tex:

\documentclass{article}
\addbibresource{test.bib}
\begin{document}
Per \textcite{friedman-1970} \ldots
\printbibliography{}
\end{document}

test.bib:

@Misc{friedman-1970,
  author       = {Milton Friedman},
  title        = {A {Friedman} Doctrine},
  month        = {sep},
  year         = 1970,
  subtitle     = {The Social Responsibility Of Business Is to Increase
                  Its Profits},
  journal      = {The New York Times},
  pages        = 17,
  day          = 13,
  url          = {https://www.nytimes.com/1970/09/13/archives/a-friedman-doctrine-the-social-responsibility-of-business-is-to.html}
}

Conversion:

$ pandoc test.tex --output=test.html --citeproc --csl=apa.csl --bibliography test.bib

Output:

<p>Per <span class="citation" data-cites="friedman-1970">Friedman (1970)</span> …</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-line-spacing="2" role="doc-bibliography">
  <div id="ref-friedman-1970" class="csl-entry" role="doc-biblioentry">
    Friedman, M. (1970). A <span>Friedman</span> doctrine: The social responsibility of business is to increase its profits. In <em>The New York Times</em> (p. 17). <a href="https://www.nytimes.com/1970/09/13/archives/a-friedman-doctrine-the-social-responsibility-of-business-is-to.html">https://www.nytimes.com/1970/09/13/archives/a-friedman-doctrine-the-social-responsibility-of-business-is-to.html</a>
  </div>
</div>

Pandoc correctly converts both title and subtitle to sentence case with apa.csl.

Does this help?

andras-simonyi commented 2 years ago

I've started to work on this and have some seemingly working code but I'm not sure about what would be the best way of integrating the conversion to sentence-case. One worry is that if all title fields are automatically converted then entries in languages using a huge amount of capitalized words (in particular, German) will be ruined without case-protective braces everywhere. @denismaier, @quintus WDYT, wouldn't this be a problem, say, for bibliographies with items whose titles are predominantly in German?

andras-simonyi commented 2 years ago

@salutis thanks, I'm sure now that the issue is not about the CSL side of things but about converting bib(la)tex title fields to CSL -- Pandoc automatically sentence-cases title fields during the conversion, while citeproc-el not (yet). BTW, your example has one rather puzzling feature: "Friedman" in the title is not downcased, even though the field doesn't contain protective braces. I'm not sure how Pandoc does this -- does it automatically protect the names occurring in the entry? Anyhow, I'm not planning to implement this type of "named entity recognition", apart from first letters of titles and subtitles, everything not protected by braces will be downcased during the conversion.

Quintus commented 2 years ago

WDYT, wouldn't this be a problem, say, for bibliographies with items whose titles are predominantly in German?

Definitely.

In German, we do not have something like the English title case. German grammar mandates sentence starts, nouns and names to be capitalised and does not have different rules for titles. That is, for German, the title is always to be treated as-is and is not to be modified by the style; automatically applying English title case would "ruin" the titles indeed. In my bibliography files (which contain German entries for the vast majority) I have not used any kind of brace protection; I simply rely on the styles not to tamper with the content of the title field, and I suppose that's what most people do. Instead I have been doing what is recommended in the linked Stackexchange page: nearly all of my bibliography entries have a langid field. For German entries, this field is either ngerman (post-2000 orthography) or german (orthography before 2000; mind you, some people insist on using the pre-2000 orthography still today, and don't get me started on the nonsensicalness of the 2000 orthography reform).

An automatic conversion of title values to English title-case would break pretty much all of my bibliography entries. Naturally, I would thus advise against it. Would it be possible to check the langid field, and if it exists and it is not one of the english variants (british, american, are there others?) leave the content of title alone? For cases where langid is not set, I think there should be an option what to do: convert to title case or leave it alone, that is, let the user decide. It could also be possible to set this option to locale (maybe even the default value?) which would look at the user's locale and apply title case conversion only if the locale starts with en_ (presuming there's no other language than English that uses title case).

Note that the langid field as per the Biblatex manual uses Babel language identifiers rather then ISO ones.

denismaier commented 2 years ago

Yes, this conversion should only be done for titles in English. Ideally, you'd be also able to specify a different language for a booktitle.

denismaier commented 2 years ago

https://pandoc.org/MANUAL.html#capitalization-in-titles

bdarcus commented 2 years ago

Do we need to update this?

https://docs.citationstyles.org/en/stable/specification.html#text-case

denismaier commented 2 years ago

@bdarcus why do you think so? Do you think we should prescribe a biblatex->CSL mapping?

bdarcus commented 2 years ago

I don't necessarily think we should; was just asking while we're discussing.

rudolf-adamkovic commented 2 years ago

@andras-simonyi

@salutis "Friedman" in the title is not downcased, even though the field doesn't contain protective braces. I'm not sure how Pandoc does this [...]

I am sorry! I forgot to update that snippet. Fixed! Both Pandoc and BibLaTeX simply sentence-case everything unprotected by the braces. No exceptions. (Some journals protect entire titles by default to avoid issues, but I consider that laziness.)

andras-simonyi commented 2 years ago

I've just merged the branch implementing sentence-case conversion for BibTeX/biblatex titles of items with English or langids. @salutis, could you check? Thanks in advance!

rudolf-adamkovic commented 2 years ago

I've just merged the branch implementing sentence-case conversion for BibTeX/biblatex titles of items with English or langids. @salutis, could you check? Thanks in advance!

I updated to the latest citeproc via MELPA, version 20211124.1535.

It does not work.

BibTeX entry:

@InBook{carver-1989,
  author       = {Raymond Carver},
  title        = {Popular Mechanics},
  chapter      = 14,
  publisher    = {Vintage Books, Random House},
  year         = 1989,
  pages        = {123-125},
  booktitle    = {What We Talk About When We Talk About Love}
}

Rendered item:

Carver, R. (1989). Popular Mechanics. In What We Talk About When We Talk About Love (pp. 123–125). Vintage Books, Random House.

Or does this mean that I need to add BibLaTeX's langid to all my BibTeX entries? I hope not. 😨

denismaier commented 2 years ago

My guess: you'll need to add a langid to those items where you want this conversion to apply

andras-simonyi commented 2 years ago

Strange -- with the same entry and the APA style I get the expected

Carver, R. (1989). Popular mechanics. In /What we talk about when we talk about love/ (pp. 123–125). Vintage Books, Random House.

output. No langid field is needed, titles in entries without a langid are sentence-cased by default (this could be configurable on the Org side in the future if necessary). Maybe somehow you are still using an older version?

rudolf-adamkovic commented 2 years ago

@andras-simonyi Strange, I restarted Emacs again, and now it works.

🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆 🎆🎆🎆🎆🎆🎆🎆🎆

64 fireworks! Closing the issue and thank you!