MichaelChirico / potools

Tools for working with translations in R
https://michaelchirico.github.io/potools/
58 stars 2 forks source link

Linter for glue strings #252

Open hadley opened 3 years ago

hadley commented 3 years ago

If we want to fully support and encourage use of glue strings, it would be nice to check that the msgstr contains the same {} wrapper components as the msgid.

This would protect against folks accidentally translating the variable name:

msgid "Hello {name}"
msgstr "Bonjour {nom}"

Or just introducing a typo

msgid "Hello {name}"
msgstr "Bonjour {nam}"
MichaelChirico commented 3 years ago

It does look like we'll have to do so optionally, here are some false positives in default packages:

./methods/po/R-methods.pot:msgid "'%s' is not a known generic function {and 'package' not specified}"
./methods/po/R-methods.pot:msgid "cannot use 'at' argument unless the function body has the form '{ ... }'"
./splines/po/splines.pot:msgid "derivs = %d >= ord = %d, but should be in {0,..,ord-1}"
./splines/po/splines.pot:msgid "derivs[%d] = %d >= ord = %d, but should be in {0,..,ord-1}"
./stats/po/R-stats.pot:msgid "'k' must be in {1, 2, ..  n - 1}"
./stats/po/R-stats.pot:msgid "dendrogram node with non-positive #{branches}"
./stats/po/R-stats.pot:msgid "dendrogram non-leaf node with non-positive #{branches}"
./stats/po/R-stats.pot:msgid "'print.level' must be in {0,1,2}"
./stats/po/R-stats.pot:msgid "'id.n' must be in {1,..,%d}"
./stats/po/R-stats.pot:msgid "'nknots' must be numeric (in {1,..,n})"
./stats/po/R-stats.pot:msgid "not using invalid df; must have 1 < df <= n := #{unique x} ="
./base/po/R.pot:msgid "invalid \\u{xxxx} sequence (line %d)"
./base/po/R.pot:msgid "invalid \\U{xxxxxxxx} sequence (line %d)"
./base/po/R.pot:msgid "invalid \\U{xxxxxxxx} value %6x (line %d)"
./base/po/R.pot:msgid "exact singularity: U[%d,%d] = 0 in LU-decomposition {Lapack 'dgetrf()'}"
./base/po/R-base.pot:msgid "'format' must be one of {\"f\",\"e\",\"E\",\"g\",\"G\", \"fg\", \"s\"}"
./graphics/po/R-graphics.pot:msgid "'side' must be in {1:4}"
./graphics/po/R-graphics.pot:msgid "layout matrix must contain at least one reference\nto each of the values {1 ... %d}"
./grid/po/R-grid.pot:msgid "nrow * ncol < #{legend labels}"
MichaelChirico commented 3 years ago

get_specials_metadata would be the most natural place to implement it internally:

https://github.com/MichaelChirico/potools/blob/master/R/specials_metadata.R

MichaelChirico commented 3 years ago

@jimhester what would you recommend for detecting glue templates in a string? Any way to avoid reinventing the logic in https://github.com/tidyverse/glue/blob/main/src/glue.c of iterating over the characters & looking for { ... } pairs? My initial guess is "no" because of how gnarly nested code could get...

(I am fine disallowing any delimiters but {/} for the purposes here)

jimhester commented 3 years ago

You could just use a custom transformer with glue to collect the results of the parsing, e.g.

res <- character()
collector <- function(expr, envir) {
  res <<- append(res, expr)
  res
}

invisible(glue::glue("this is a msg with {foo} {bar}", .transformer = collector))

res
#> [1] "foo" "bar"

Created on 2021-11-08 by the reprex package (v2.0.1) This would ensure you are using the same logic and also give you flexibility to support custom delimiters if needed.

MichaelChirico commented 3 years ago

Nice! That looks like a pretty good fit for the use case here.

(1) Run glue::glue() on the string with the collector(). (2) Iterate over the res to find the string indices of fixed strings, e.g. {foo}, then {bar}.

It's not very efficient but it shouldn't really matter here.