eliocamp / rhelpi18n

Add support for multilingual documentation to R
https://eliocamp.github.io/rhelpi18n/
GNU General Public License v3.0
23 stars 3 forks source link

Investigate using po files with context #16

Open eliocamp opened 3 months ago

eliocamp commented 3 months ago

@daroczig mentioned that .po files can have "context" which we could use to map to the structure of the .Rd file.

eliocamp commented 3 months ago

Creating a pot file with context doesn't seem very hard. Right now:

utils:::.getHelpFile(help("mean")) |>
    rd_flatten_po() |>
    write_string() |>
    writeLines("text.pot")

creates this file

msgctxt "title"
msgid "Arithmetic Mean"
msgstr ""

msgctxt "name"
msgid "mean"
msgstr ""

msgctxt "alias"
msgid "mean"
msgstr ""

msgctxt "alias"
msgid "mean.default"
msgstr ""

msgctxt "keyword"
msgid "univar"
msgstr ""

msgctxt "description"
msgid "  Generic function for the (trimmed) arithmetic mean.
"
msgstr ""

msgctxt "usage"
msgid "mean(x, \dots{})

\method{mean}{default}(x, trim = 0, na.rm = FALSE, \dots{})
"
msgstr ""

msgctxt "arguments.x"
msgid "an \R{} object.  Currently there are methods for
    numeric/logical vectors and \link[=Dates]{date},
    \link{date-time} and \link{time interval} objects.  Complex vectors
    are allowed for \code{trim = 0}, only."
msgstr ""

msgctxt "arguments.trim"
msgid "the fraction (0 to 0.5) of observations to be
    trimmed from each end of \code{x} before the mean is computed.
    Values of trim outside that range are taken as the nearest endpoint.
  "
msgstr ""

msgctxt "arguments.na.rm"
msgid "a logical evaluating to \code{TRUE} or \code{FALSE}
    indicating whether \code{NA} values should be stripped before the
    computation proceeds."
msgstr ""

msgctxt "arguments.\dots{}"
msgid "further arguments passed to or from other methods."
msgstr ""

msgctxt "value"
msgid "  If \code{trim} is zero (the default), the arithmetic mean of the
  values in \code{x} is computed, as a numeric or complex vector of
  length one.  If \code{x} is not logical (coerced to numeric), numeric
  (including integer) or complex, \code{NA_real_} is returned, with a warning.

  If \code{trim} is non-zero, a symmetrically trimmed mean is computed
  with a fraction of \code{trim} observations deleted from each end
  before the mean is computed.
"
msgstr ""

msgctxt "references"
msgid "  Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
  \emph{The New S Language}.
  Wadsworth & Brooks/Cole.
"
msgstr ""

msgctxt "seealso"
msgid "  \code{\link{weighted.mean}}, \code{\link{mean.POSIXct}},
  \code{\link{colMeans}} for row and column means.
"
msgstr ""

msgctxt "examples"
msgid "x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))
"
msgstr ""

I can open it with Poedit, which shows these useful coloured titles

image

The multiline strings don't work.

The dot is not useful for specifying the structure because arguments can have dots in them. What could we use here?

daroczig commented 3 months ago

I think you can use space in the context as well, e.g. argument na.rm

eliocamp commented 3 months ago

Ah, the latest commit solved the newline issue. Using this would solve a lot of issues and simplify the code a lot. Not need to have a custom translation function. No need for brittlely matching the language. Translation modules would store translations as common messages and then hey would be translated with gettext() (so, still need to modify '.getHelpFile()' to translate the strings.

For #12 , I guess we could use fuzzy and/or custom comments.

eliocamp commented 3 months ago

I think you can use space in the context as well, e.g. argument na.rm

Ah, nice.

eliocamp commented 3 months ago

This might be a dumb question, but I can't seem to be able to use the translations. I created a new package, added a test translation file

msgid ""
msgstr ""
"Project-Id-Version: \n"
"POT-Creation-Date: \n"
"PO-Revision-Date: \n"
"Last-Translator: \n"
"Language-Team: \n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 3.4.4\n"

msgid "stringunica"
msgstr "uniquestring"

msgctxt "agromet_informe title"
msgid "Formato de salida para Informes"
msgstr "Report format"

msgctxt "agromet_informe arguments ..."
msgid ""
"cualquier argumento que requiera \\code{\\link[rmarkdown:pdf_document]"
"{rmarkdown::pdf_document()}}."
msgstr ""
"any argument required by \\code{\\link[rmarkdown:pdf_document]{rmarkdown::"
"pdf_document()}}."

compiled the translation with poedit and finally installed the package. I was expecting to be able to access the translation string with gettext and using domain R-pkgname, but it's not working?

gettext("stringunica", domain = "R-agroclimatico.en")
#> [1] "stringunica"

Any tips on what might I be missing? @daroczig @MichaelChirico

MichaelChirico commented 3 months ago

compiled the translation with poedit

can you confirm that means creating a .mo file in the "expected" place within the inst/ directory?

can you point me to the relevant package directory?

eliocamp commented 3 months ago

can you confirm that means creating a .mo file in the "expected" place within the inst/ directory?

Ah.. I'm an idiot. No, I'm just putting the .mo file in the po directory. I thought potools::po_compile() did all the heavy lifting. What should I run after compiling?

MichaelChirico commented 3 months ago

hmm yes po_compile() is supposed to handle that for you. can you try with verbose=TRUE and possibly share a repro directory with me to play around with?

eliocamp commented 3 months ago

Strange, I've just run po_compile with debug and now it's working. I probably missed some critical step.

Now the next step is know how to use gettext with context.

eliocamp commented 3 months ago

BTW: I've just uploaded the repo here: https://github.com/eliocamp/agroclimatico.en I'm playing with this in the po-files branch of this repo.

MichaelChirico commented 3 months ago

Here's the documentation on context:

https://www.gnu.org/software/gettext/manual/html_node/Contexts.html

FWIW R itself does not use any of this as of now:

https://github.com/search?q=repo%3Ar-devel%2Fr-svn%20%2Fpgettext%2F&type=code

So we'll probably need to start with an R wrapper around pgettext() (and friends?) that can be called from R like gettext() can.

eliocamp commented 3 months ago

Oof, it was too good to be true xD. But adding context support to R might be a good sideffect.