emacs-citar / citar

Emacs package to quickly find and act on bibliographic references, and edit org, markdown, and latex academic documents.
GNU General Public License v3.0
479 stars 53 forks source link

Discrepancies in citation format when using `citar-insert-reference` #784

Open benthamite opened 1 year ago

benthamite commented 1 year ago

Apologies if I'm missing something obvious, but I notice that when I insert a formatted reference with citar-insert-reference, there are various discrepancies between the inserted reference and the same reference as it appears when exported with one of the org-mode export commands, such as org-md-export-to-markdown. Most notably, the titles are not capitalized correctly (e.g. the braces surrounding a word are not respected).

As an example, consider the following bibtex entry:

@online{Hanson2023CanHumansBe,
  abstract =     {It is one of the most fundamental questions in the
                  social and human sciences: how culturally plastic are
                  people? Many anthropologists have long championed the
                  view that humans are very plastic; with matching
                  upbringing people can be made to behave a very wide
                  range of ways, and to want a very wide range of
                  things. Others say human nature is far more
                  constrained, and collect descriptions of "human
                  universals" (See Brown's 1991},
  author =   {Hanson, Robin},
  langid =   {english},
  timestamp =    {2023-06-14 15:12:51 (GMT)},
  title =    {Can humans be the {FORTRAN} of creatures?},
  url =
                  {https://www.overcomingbias.com/p/how-plastic-are-peoplehtml},
  urldate =  {2023-06-14},
}

Inserting this reference by invoking citar-insert-reference results in

[1]R. Hanson, “Can humans be the fortran of creatures?” https://www.overcomingbias.com/p/how-plastic-are-peoplehtml (accessed Jun. 14, 2023).

Whereas exporting a file that cites that work via org-md-export-to-markdown will show it in the "bibliography" section as

R. Hanson, “Can humans be the FORTRAN of creatures?” https://www.overcomingbias.com/p/how-plastic-are-peoplehtml (accessed Jun. 14, 2023).

I have used the IEEE csl citation style in this case, but the issue occurs with all the styles I tried.

bdarcus commented 1 year ago

Are you using the default formatter, or the citeproc-el one?

I realize it's not documented in the README (PR welcome), but I suspect that's it; either you aren't using the citeproc formatter, or you're using a different style for that?

benthamite commented 1 year ago

Thanks for the quick reply.

In my config, the values of citar-citeproc-csl-styles-dir and citar-citeproc-csl-locales-dir are set to org-cite-csl-styles-dir and org-cite-csl-locales-dir, respectively, and citar-format-reference-function is set to citar-citeproc-format-reference. Finally, citar-citeproc-select-csl-style is set to ieee.csl, which is a file that exists in citar-citeproc-csl-styles-dir. Is there anything else that needs to be done for citar-citeproc.el to work properly?

In case it helps understand what might be going on, I interned citar-citeproc-format-reference and copied the output of each step in the evaluation to the attached file.

debugger-output.txt

bdarcus commented 1 year ago

OK.

To go back to this:

Most notably, the titles are not capitalized correctly (e.g. the braces surrounding a word are not respected.

Here we're using the citar cache, rather than parsing the bib on its own.

Obviously that enhances responsiveness, at the expense of some correctness.

Not sure if there's an easy way to resolve that, or if we could make it configurable.

bdarcus commented 10 months ago

@benthamite can you confirm my hunch in my last reply?

benthamite commented 10 months ago

Apologies, I hadn't seen your previous message. I should be able to look into this within the next couple of days.

benthamite commented 10 months ago

Hi @bdarcus,

For testing purposes, I created bibliography.bib:

@online{Hanson2023CanHumansBe,
  abstract =     {It is one of the most fundamental questions in the
                  social and human sciences: how culturally plastic are
                  people? Many anthropologists have long championed the
                  view that humans are very plastic; with matching
                  upbringing people can be made to behave a very wide
                  range of ways, and to want a very wide range of
                  things. Others say human nature is far more
                  constrained, and collect descriptions of "human
                  universals" (See Brown's 1991},
  author =   {Hanson, Robin},
  langid =   {english},
  timestamp =    {2023-06-14 15:12:51 (GMT)},
  title =    {Can humans be the {FORTRAN} of creatures?},
  url =
                  {https://www.overcomingbias.com/p/how-plastic-are-peoplehtml},
  urldate =  {2023-06-14},
}

and config.el:

(setq org-cite-global-bibliography '("bibliography.bib"))
(setq org-cite-export-processors
      '((t . (csl "ieee.csl"))))
(setq citar-bibliography '("bibliography.bib"))

After evaluating the latter, I evaluate (citar-citeproc--itemgetter '("Hanson2023CanHumansBe")), which returns

(("Hanson2023CanHumansBe" (URL . "https://www.overcomingbias.com/p/how-plastic-are-peoplehtml") (title . "Can humans be the fortran of creatures?") (blt-type . "online") (type . "webpage") (language . "en-US") (abstract . "It is one of the most fundamental questions in the social and human sciences: how culturally plastic are people? Many anthropologists have long championed the view that humans are very plastic; with matching upbringing people can be made to behave a very wide range of ways, and to want a very wide range of things. Others say human nature is far more constrained, and collect descriptions of \"human universals\" (See Brown’s 1991") (author ((family . "Hanson") (given . "Robin"))) (accessed (date-parts (2023 6 14)))))

By contrast, if I create document.org

[cite:@Hanson2023CanHumansBe]

#+print_bibliography:

and run org-md-export-to-markdown, I get

<a href="#citeproc_bib_item_1">[1]</a>  

<style>.csl-left-margin{float: left; padding-right: 0em;}
 .csl-right-inline{margin: 0 0 0 1em;}</style><div class="csl-bib-body">
  <div class="csl-entry"><a id="citeproc_bib_item_1"></a>
    <div class="csl-left-margin">[1]</div><div class="csl-right-inline">R. Hanson, “Can humans be the FORTRAN of creatures?” <a href="https://www.overcomingbias.com/p/how-plastic-are-peoplehtml">https://www.overcomingbias.com/p/how-plastic-are-peoplehtml</a> (accessed Jun. 14, 2023).</div>
  </div>
</div>

As you can see, the word "FORTRAN" is in all caps in the exported Markdown, but not in the output of (citar-citeproc--itemgetter '("Hanson2023CanHumansBe")).

I'm not entirely sure this is the kind of test you wanted me to run. Please let me know if there's anything else I should do. I'm attaching the relevant files in case it helps you reproduce the issue. files.zip

bdarcus commented 10 months ago

Thanks.

I'm almost certain my assumption is correct; that using our cache for the formatting means the TeX markup gets stripped before citeproc sees it.

Still not sure what we can, or should, do about that.