GeoBosh / Rdpack

R package Rdpack provides functions and macros facilitating writing and management of R documentation.
https://geobosh.github.io/Rdpack/
28 stars 6 forks source link

Superfluous braces when citing an entry #25

Closed MLopez-Ibanez closed 2 years ago

MLopez-Ibanez commented 2 years ago

As shown here: https://mlopez-ibanez.github.io/eaf/reference/whv_rect.html

Citing this entry:

@article{DiaLop2020ejor,
  author =       { Juan Esteban Diaz and Manuel L{\'o}pez-Ib{\'a}{\~n}ez },
  title =        {Incorporating Decision-Maker's Preferences into the Automatic
                  Configuration of Bi-Objective Optimisation Algorithms},
  journal =      {European Journal of Operational Research},
  year =         2021,
  volume =       289,
  number =       3,
  pages =        {1209--1222},
  doi =          {10.1016/j.ejor.2020.07.059},
}

with \insertCite{DiaLop2022ejor;textual}{eaf}renders as Diaz and López-Ibá{ñ}ez (2021) but the bibliography entry does not show those superfluous braces.

I have tried various ways to encode the name and none works.

GeoBosh commented 2 years ago

Thanks for the report. Initially I thought that this is some straightforward omission of handling tilde diacritics but couldn't find any in Rdpack and rbibutils. After some digging in the R sources I narrowed the problem down to tools:::cleanupLatex. I may be able to sidestep this but I am giving technical details below, for easy reference for a report I will put on R-devel.

Inside tools:::cleanupLatex, the difference between the handling of the diacritics in your name appeared eventually after a call to tools::deparseLatex (via toRd). There is nothing special in depaseLatex about \~ but after looking at the source code of deparseLatex and the object it was processing (obtained from parseLatex), I realised that the code indeed would put the second one in braces. Indeed, exchanging the order of the consecutive accented letters in your name (sorry for playing with it)) leaves the second one parenthesised:

> e1 <- "Manuel L{\\'o}pez-Ib{\\'a}{\\~n}ez"
> e2 <- "Manuel L{\\'o}pez-Ib{\\~n}{\\'a}ez"
> tools:::cleanupLatex(e1)
## [1] "Manuel López-Ibá{ñ}ez"
> tools:::cleanupLatex(e2)
[1] "Manuel López-Ibñ{á}ez"

Here is the source of deparseLatex. If dropBraces is TRUE it strips the braces but only if the preceding tag is "TEXT":

deparseLatex <- function(x, dropBraces = FALSE)
{
    result <- character()
    lastTag <- "TEXT"
    for (i in seq_along(x)) {
        a <- x[[i]]
        tag <- attr(a, "latex_tag")
        if (is.null(tag)) tag <- "NULL"
        switch(tag,
        VERB = ,
        TEXT = ,
        MACRO = ,
        COMMENT = result <- c(result, a),
        BLOCK = result <- c(result, if (dropBraces && lastTag == "TEXT") deparseLatex(a) else c("{", deparseLatex(a), "}")),
        ENVIRONMENT = result <- c(result,
            "\\begin{", a[[1L]], "}",
            deparseLatex(a[[2L]]),
            "\\end{", a[[1L]], "}"),
        MATH = result <- c(result, "$", deparseLatex(a), "$"),
        NULL = stop("Internal error, no tag", domain = NA)
        )
        lastTag <- tag
    }
    paste(result, collapse="")
}

I saved your example to file"issueRdpack25.bib" and read it in as in Rdpack (but the effect is the same as in the examples above). a1, a2, a3a emulate the steps taken by cleanupLatex to check where the difference between the handling of accents appears:

tmp <- readBib("issueRdpack25.bib", encoding = "utf8", direct=TRUE, extra=TRUE, texChars = "Rdpack")

a1 <- tools:::parseLatex(tmp$author)
> a1
## Juan Esteban Diaz
## Manuel L{\'o}pez-Ib{\'a}{\~n}ez 
> a2 <- tools:::latexToUtf8(a1)
> a2
## Juan Esteban Diaz
## Manuel L{ó}pez-Ib{á}{ñ}ez 
> a3a <- tools:::deparseLatex(a2, TRUE)
> a3a
## [1] "Juan Esteban Diaz\nManuel López-Ibá{ñ}ez"

Notice below that the first accented character is preceded by a "TEXT" element, as is the first of the two consecutive ones.But the accented characters themselves are in "BLOCK" components. Hence, the second is put in braces by deparseLatex.

> unclass(a2)

[[1]]
[1] "Juan Esteban Diaz\nManuel L"
attr(,"latex_tag")
[1] "TEXT"

[[2]]
[[2]][[1]]
[1] "ó"
attr(,"latex_tag")
[1] "TEXT"

attr(,"latex_tag")
[1] "BLOCK"

[[3]]
[1] "pez-Ib"
attr(,"latex_tag")
[1] "TEXT"

[[4]]
[[4]][[1]]
[1] "á"
attr(,"latex_tag")
[1] "TEXT"

attr(,"latex_tag")
[1] "BLOCK"

[[5]]
[[5]][[1]]
[1] "ñ"
attr(,"latex_tag")
[1] "TEXT"

attr(,"latex_tag")
[1] "BLOCK"

[[6]]
[1] "ez"
attr(,"latex_tag")
[1] "TEXT"

I don't know if why deparseLatex only drops the braces when the previous block is TEXT but maybe parseLatex uses blocks for unrelated purposes. Difficult to tell since parseLatex is a formal parser with opaque code.

GeoBosh commented 2 years ago

I posted a report about what I thing is a bug in R to R-devel, see https://stat.ethz.ch/pipermail/r-devel/2022-April/081604.html.

If nothing happens I will look at how to circumvent this.

MLopez-Ibanez commented 2 years ago

I posted a report about what I thing is a bug in R to R-devel, see https://stat.ethz.ch/pipermail/r-devel/2022-April/081604.html.

If nothing happens I will look at how to circumvent this.

Many thanks. I have circumvented it by using an explicit UTF8 "ñ". I tried to avoid this to be able to share the same bibtex files between R and other tools, some of them don't work well with utf8.

GeoBosh commented 2 years ago

Many thanks. I have circumvented it by using an explicit UTF8 "ñ". I tried to avoid this to be able to share the same bibtex files between R and other tools, some of them don't work well with utf8.

I am not converting the accented characters to UTF8 when reading in the file with rbibutils::readBib, which offers this option, mainly because on Windows some characters get mangled if they are not available in the current Windows code page (another reason is that some UTF8 characters are problematic for Latex). The mangling on Windows should go away though (some time) after the release of R 4.2 which has native UTF8 locale.

MLopez-Ibanez commented 2 years ago

Yes, I didn't wish to imply that my "fix" is the right fix, only that I'm happy to wait for a proper solution in R because I found a workaround that works for my particular goal (having a nice webpage online). (Many thanks for all your help. Next time I'm in Manchester, I will buy you a beer!)

GeoBosh commented 2 years ago

I will be waiting for your call!

GeoBosh commented 2 years ago

Fixed in Rdpack v2.3.1.