GeoBosh / Rdpack

R package Rdpack provides functions and macros facilitating writing and management of R documentation.
https://geobosh.github.io/Rdpack/
28 stars 6 forks source link

possibly non-existing or duplicated key(s) in bib file warning #18

Closed shamindras closed 3 years ago

shamindras commented 3 years ago

Hi Dr. Boshnakov (@GeoBosh),

Thank you for creating such a wonderful package. I'm currently trying to use Rdpack in the roxygen2 comments for a function in an R package I'm writing. I encounter an error in using the package (I'm a first time user of this package). Here is the branch with errors for your reference. Here is the R CMD check warning message.

Error Details

I am encountering the following error when trying to view the rendered documentation:

CLICK ME: Error/Warning Message

``` > Rdpack::viewRd("./man/comp_sandwich_qr_var.Rd", type = "html") Warning messages: 1: In safe_cite(keys, bibs, textual = textual, before = before, after = after, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': white1980usinglsapproxunknownregfuncs 2: In safe_cite(keys, bibs, textual = textual, before = before, after = after, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': white1980heteroskedasticconsistentcovest 3: In safe_cite(key, bibs, textual = textual, bibpunct = bibpunct, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': buja2019modelsasapproximationspart1 4: In safe_cite(key, bibs, textual = textual, bibpunct = bibpunct, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': buja2019modelsasapproximationspart2 5: In safe_cite(keys, bibs, textual = textual, before = before, after = after, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': white1980usinglsapproxunknownregfuncs 6: In safe_cite(keys, bibs, textual = textual, before = before, after = after, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': white1980heteroskedasticconsistentcovest 7: In safe_cite(key, bibs, textual = textual, bibpunct = bibpunct, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': buja2019modelsasapproximationspart1 8: In safe_cite(key, bibs, textual = textual, bibpunct = bibpunct, : possibly non-existing or duplicated key(s) in bib file from package 'Rdpack': buja2019modelsasapproximationspart2 ```

Session Info

CLICK ME

``` R > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] maar_0.1.0 testthat_3.0.0 loaded via a namespace (and not attached): [1] pkgload_1.1.0 tidyr_1.1.2 Rdpack_2.1 [4] assertthat_0.2.1 profmem_0.5.0 vipor_0.4.5 [7] yaml_2.2.1 remotes_2.2.0 sessioninfo_1.1.1 [10] pillar_1.4.6 backports_1.2.0 lattice_0.20-41 [13] glue_1.4.1.9000 digest_0.6.27 rbibutils_2.0 [16] colorspace_2.0-0 sandwich_3.0-0 htmltools_0.5.0 [19] Matrix_1.2-18 pkgconfig_2.0.3 devtools_2.3.0 [22] broom_0.7.2 bench_1.1.1 purrr_0.3.4 [25] scales_1.1.1 processx_3.4.4 tibble_3.0.4 [28] generics_0.1.0 farver_2.0.3 ggplot2_3.3.2 [31] usethis_1.6.1 ellipsis_0.3.1 withr_2.3.0 [34] cli_2.1.0 magrittr_1.5 crayon_1.3.4 [37] memoise_1.1.0 evaluate_0.14 ps_1.4.0 [40] fs_1.5.0 fansi_0.4.1 MASS_7.3-51.6 [43] xml2_1.3.2 beeswarm_0.2.3 pkgbuild_1.1.0 [46] tools_4.0.2 prettyunits_1.1.1 gbRd_0.4-11 [49] lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 [52] callr_3.5.1 compiler_4.0.2 rlang_0.4.8 [55] grid_4.0.2 rstudioapi_0.13 rmarkdown_2.5 [58] gtable_0.3.0 roxygen2_7.1.1 R6_2.5.0 [61] zoo_1.8-8 knitr_1.30 dplyr_1.0.2 [64] utf8_1.1.4 commonmark_1.7 rprojroot_2.0.2 [67] desc_1.2.0 stringi_1.5.3 ggbeeswarm_0.6.0 [70] Rcpp_1.0.5 vctrs_0.3.4 tidyselect_1.1.0 [73] xfun_0.19 ```

Setup

I've done the following to ensure that Rdpack is setup correctly in our R pacakge.

  1. Added Rdpack to imports as follows:

    CLICK ME

    ``` Imports: magrittr, rlang (>= 0.1.2), broom, Matrix, sandwich, bench, Rdpack Suggests: spelling, testthat, roxygen2, covr RdMacros: Rdpack ```

  2. Created a inst/REFERENCES.bib file which contains the following contents:

    CLICK ME

    ``` @article{buja2019modelsasapproximationspart1, author = {Buja, Andreas and Brown, Lawrence and Berk, Richard and George, Edward and Pitkin, Emil and Traskin, Mikhail and Zhang, Kai and Zhao, Linda}, title = {Models as approximations {I}: consequences illustrated with linear regression}, journal = {Statist. Sci.}, fjournal = {Statistical Science. A Review Journal of the Institute of Mathematical Statistics}, volume = {34}, year = {2019}, number = {4}, pages = {523--544}, issn = {0883-4237}, mrclass = {62J05 (62F12 62F35 62H12 62J02)}, mrnumber = {4048582}, doi = {10.1214/18-STS693} } @article{buja2019modelsasapproximationspart2, author = {Buja, Andreas and Brown, Lawrence and Kuchibhotla, Arun Kumar and Berk, Richard and George, Edward and Zhao, Linda}, title = {Models as approximations {II}: {A} model-free theory of parametric regression}, journal = {Statist. Sci.}, fjournal = {Statistical Science. A Review Journal of the Institute of Mathematical Statistics}, volume = {34}, year = {2019}, number = {4}, pages = {545--565}, issn = {0883-4237}, mrclass = {62J05 (62F40 62G05 62J20)}, mrnumber = {4048583}, doi = {10.1214/18-STS694} } @article{white1980heteroskedasticconsistentcovest, author = {White, Halbert}, title = {A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity}, journal = {Econometrica}, fjournal = {Econometrica. Journal of the Econometric Society}, volume = {48}, year = {1980}, number = {4}, pages = {817--838}, issn = {0012-9682}, mrclass = {62J05 (62G10)}, mrnumber = {575027}, mrreviewer = {Norbert Christopeit}, doi = {10.2307/1912934} } @article{white1980usinglsapproxunknownregfuncs, author = {White, Halbert}, title = {Using least squares to approximate unknown regression functions}, journal = {Internat. Econom. Rev.}, fjournal = {International Economic Review}, volume = {21}, year = {1980}, number = {1}, pages = {149--170}, issn = {0020-6598}, mrclass = {62J02 (62P20)}, mrnumber = {572464}, mrreviewer = {V. K. Srivastava}, doi = {10.2307/2526245} } ```

  3. Added importFrom(Rdpack,reprompt) to the NAMESPACE file

  4. Wrote an roxygen2 file with the following roxygen2 comment

    CLICK ME

    ``` R #' Compute the White sandwich estimator of standard errors for #' ordinary least squares (OLS) regression. This is based on the series of #' papers \insertCite{white1980usinglsapproxunknownregfuncs}{Rdpack} and #' \insertCite{white1980heteroskedasticconsistentcovest}{Rdpack}. For more details #' \insertCite{@see also @buja2019modelsasapproximationspart1 and @buja2019modelsasapproximationspart2;textual}{Rdpack} #' #' @param lm_object (lm) : lm object #' #' @return (matrix) : White sandwich estimator of variance for OLS regression #' @export #' #' @importFrom Rdpack reprompt #' @references #' \insertAllCited{} #' @examples #' \dontrun{ #' n <- 1e5 #' X <- stats::rnorm(n, 0, 1) #' y <- 2 + X * 1 + stats::rnorm(n, 0, 10) #' lm_fit <- stats::lm(y ~ X) #' sandwich_qr_var <- comp_sandwich_qr_var(lm_fit) #' } ```

  5. Final rendered the package documentation using devtools::document() which doesn't throw any error, but Rdpack::viewRd("./man/comp_sandwich_qr_var.Rd", type = "html") throws the warnings above. The BibTeX references are also not being rendered in the html file.

Could I please get assistance with resolving the issue? I'm not sure what I'm doing wrong here.

GeoBosh commented 3 years ago

Thanks for the report and the praise :).

The second argument of the macros specifies the package from which the references should be taken, in this case package maar. Just change {Rdpack} to {maar}:

\insertCite{white1980usinglsapproxunknownregfuncs}{maar}
\insertCite{white1980heteroskedasticconsistentcovest}{maar}
For more details
\insertCite{@see also @buja2019modelsasapproximationspart1 and @buja2019modelsasapproximationspart2;textual}{maar}

You get warnings , not errors, in such cases since if you prefer to fix references later it may be frustrating not to be able to build your package.

Let me know if this solves the issue. Also, it would be useful to know if viewRd works as expected, since I have had very little feedback on it.

shamindras commented 3 years ago

Thanks @GeoBosh - that did partially help resolve the issue! I knew I'd made a simple mistake.

You get warnings , not errors, in such cases since if you prefer to fix references later it may be frustrating not to be able to build your package.

That's true, I've reflected the issue title to reflect that this is indeed a warning. Although your suggestion to fix the references later is indeed useful, I'm trying to document the package in as much detail as every function is written. I'll use this strategy if we keep getting references warning issues 😄 .

Here is the revised roxygen2 comment:

#' Compute the White sandwich estimator of standard errors for
#' ordinary least squares (OLS) regression. This is based on the series of
#' papers \insertCite{white1980usinglsapproxunknownregfuncs}{maar} and \insertCite{white1980heteroskedasticconsistentcovest}{maar}. For more details \insertCite{@see also @buja2019modelsasapproximationspart1 and @buja2019modelsasapproximationspart2;textual}{maar}
#'
#' @param lm_object (lm) : lm object
#'
#' @return (matrix) : White sandwich estimator of variance for OLS regression
#' @export
#'
#' @importFrom Rdpack reprompt
#' @references
#'     \insertAllCited{}
#' @examples
#' \dontrun{
#' n <- 1e5
#' X <- stats::rnorm(n, 0, 1)
#' y <- 2 + X * 1 + stats::rnorm(n, 0, 10)
#' lm_fit <- stats::lm(y ~ X)
#' sandwich_qr_var <- comp_sandwich_qr_var(lm_fit)
#' }

I mention partially resolve, because when I use viewRd I get the attached screenshot.

viewRd_maar_01
  1. It seems like the references in the main description are indented in a strange way e.g see "for more details..." appears on a new line on a different indentation
  2. The insertAllCited{} does not seem to render the references

Any ideas what may be the issue?

GeoBosh commented 3 years ago

Regarding '1. It seems like the references in the main description are indented in a strange way e.g see "for more details..." appears on a new line on a different indentation', I don't observe it on my side and more information is needed. If you are rendering it as html in Rstudio, it is probably the way it renders titles. Notice that it takes the text as a title which it probably renders centred. But it is extremely long, so the first line takes the whole line, while the second is incomplete and centred (I think, that is not indentation but centring, but I may be wrong). You probably didn't intend this to be a title but if you did, please follow up with more details.

Regarding "2. The insertAllCited{} does not seem to render the references".

Short answer: don't indent \insertAllCited{} by four spaces - any other number of spaces, including zero, will do. For example:

#' @references
#' \insertAllCited{}

or even

#' @references \insertAllCited{}

Long answer: This is a markdown feature, which seems to be used by roxygen2 - lines indented by exactly four spaces are rendered verbatim. In this case the following chunk in your R file

#' @references
#'     \insertAllCited{}

is translated to the following snipped in the Rd file:

\references{
\preformatted{\\insertAllCited\{\}
}

\preformatted is the Rd syntax for verbatim, so the command inside it is not interpreted at all.

I will mention the above in the documentation but I am surprised that it was not reported before.

shamindras commented 3 years ago

Thanks again @GeoBosh

  1. The first issue is resolved. I should have put the first line as a title, and left a blank line for the before the description. One minor thing, and I don't believe this is an Rdpack issue, but there is an extra line and indentation after the lm_object in the Arguments section. I expected the description to be on the same line. Just wanted to check before I investigated this further (please see screenshot below).
New roxygen2 comment

``` R #' Compute the White sandwich estimator of standard errors for OLS #' #' Compute the White sandwich estimator of standard errors for #' ordinary least squares (OLS) regression, \insertCite{@see @white1980usinglsapproxunknownregfuncs and @white1980heteroskedasticconsistentcovest;textual}{maar}. For #' more details #' \insertCite{@see also @buja2019modelsasapproximationspart1 and @buja2019modelsasapproximationspart2;textual}{maar}. #' #' @param lm_object An lm (OLS) object #' #' @return (matrix) White sandwich estimator of variance for OLS regression #' #' @export #' #' @importFrom Rdpack reprompt #' #' @references \insertAllCited{} #' #' @examples #' \dontrun{ #' n <- 1e5 #' X <- stats::rnorm(n, 0, 1) #' y <- 2 + X * 1 + stats::rnorm(n, 0, 10) #' lm_fit <- stats::lm(y ~ X) #' sandwich_qr_var <- comp_sandwich_qr_var(lm_fit) #' } ```

viewRd_maar_02
  1. The second issue is also resolved with your suggestions - thanks! However, I had copied the indented block directly from the Rdpack README file. See the citations section, the roxygen2 @references is indented there.

in roxygen2, the references chunk might look like this:

#' @references
#'     \insertAllCited{}

Perhaps it should be replaced with the suggestion you have provided to help future users?

GeoBosh commented 3 years ago

Regarding 2., I noticed that my example uses exactly four spaces! This must be relatively recent feature of roxygen2, since I myself have copied that line (though I don't use roxygen2 much).

The text for Description is typically rendered as a separate paragraph. As to the arguments, in the text rendering I get the explanation on the same line but in pdf and html they are on new line - again, a design decision. This may also depend on the number of characters in the argument's name. viewRd doesn't do the rendering, it just makes sure that the call to the relevant R functions contain the definitions of the citation macros.

shamindras commented 3 years ago

Regarding 2., I noticed that my example uses exactly four spaces! This must be relatively recent feature of roxygen2, since I myself have copied that line (though I don't use roxygen2 much).

I see. I'm using roxygen2_7.1.1 for reference and the 4 spaces gave me the issue. It may be good to mention both ways of writing this for roxygen2 depending on the version that the user has installed.

The text for Description is typically rendered as a separate paragraph. As to the arguments, in the text rendering I get the explanation on the same line but in pdf and html they are on new line - again, a design decision. This may also depend on the number of characters in the argument's name. viewRd doesn't do the rendering, it just makes sure that the call to the relevant R functions contain the definitions of the citation macros.

Thanks, I will take a further look at the roxygen2 details for this. Appreciate your explanation.

Please feel free to mark this issue as closed.

GeoBosh commented 3 years ago

I changed the example and added a comment about this in README on the github repo.

shamindras commented 3 years ago

I changed the example and added a comment about this in README on the github repo.

Looks great - thanks again for your help on this, and for creating such a nice documentation tool!

GeoBosh commented 3 years ago

Thanks for the report, too.

ms609 commented 5 months ago

Just noting that another possible cause of the "possibly non-existing or duplicated key(s) in bib file" warning is unexpected formatting in a bib entry.

I encountered the message when devtools::check_man()ing a package containing

@article{MyKey,
  title = {Some study},
  author = {Body, Some},
  date = {2021}
}

When I installed the package, I also saw a message pointing out that A bibentry of bibtype 'Article' has to specify the field: year. Indeed, replacing the date field with year allowed the key to be recognized, resolving the issue.

I'm not sure that there's anything that Rdpack can easily do to flag this up as a cause, so am not opening as a new issue, but thought I'd add the note here in case it helps others.

GeoBosh commented 4 months ago

Thanks, the message "possibly non-existing or duplicated key(s) in bib file" is not ideal and could, at least, be improved e.g., adding : or missing a required field).

This is a very old message (from 2018, amended in 2020). I will check why it doesn't emit the actual error message, as well.

There is the separate issue of whether Rdpack could/should render silently trivial Biblatex differences, such as using 'date' when year is missing or journaltitle if title is missing.

ms609 commented 4 months ago

Thanks! And for what it's worth, Zotero/BetterBibTex uses journaltitle as an alias for journal (rather than title) – not sure if other software uses this field differently.