Closed maelle closed 1 year ago
that sounds wrong to me! can you share more info (the platform you're using, the stack trace)?
If in https://github.com/maelle/pockage/blob/a36978a1c06dcdc3dbd6200f4110c2bbaa1ba21b/po/R-es.po#L20 I add "¡" I get
> potools::po_compile()
Recompiling 'ca' R translation
Running system command msgfmt -c --statistics -o './inst/po/ca/LC_MESSAGES/R-pockage.mo' './po/R-ca.po'...
./po/R-ca.po:15:19: invalid multibyte sequence
./po/R-ca.po:15:20: invalid multibyte sequence
msgfmt: found 2 fatal errors
Warning: running msgfmt on R-ca.po failed.
Here is the po file:
msgid ""
msgstr ""
"Project-Id-Version: pockage 0.0.0.9000\n"
"POT-Creation-Date: 2023-10-06 10:45+0200\n"
"PO-Revision-Date: 2023-10-06 10:33+0200\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: ca\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=ASCII\n"
"Content-Transfer-Encoding: 8bit\n"
#: mensaje.R:9
msgid "user"
msgstr "usuari/usuària"
#: mensaje.R:10
msgid "Hello {name}!"
msgstr "Hola {name}!"
Recompiling 'es' R translation
Running system command msgfmt -c --statistics -o './inst/po/es/LC_MESSAGES/R-pockage.mo' './po/R-es.po'...
./po/R-es.po:20:9: invalid multibyte sequence
./po/R-es.po:20:10: invalid multibyte sequence
msgfmt: found 2 fatal errors
Warning: running msgfmt on R-es.po failed.
Here is the po file:
msgid ""
msgstr ""
"Project-Id-Version: pockage 0.0.0.9000\n"
"POT-Creation-Date: 2023-10-06 10:45+0200\n"
"PO-Revision-Date: 2023-10-06 10:33+0200\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: es\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=ASCII\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
#: mensaje.R:9
msgid "user"
msgstr "usuari@"
#: mensaje.R:10
msgid "Hello {name}!"
msgstr "¡Hola {name}!"
Recompiling 'fr' R translation
Running system command msgfmt -c --statistics -o './inst/po/fr/LC_MESSAGES/R-pockage.mo' './po/R-fr.po'...
./po/R-fr.po:16:20: invalid multibyte sequence
./po/R-fr.po:16:21: invalid multibyte sequence
msgfmt: found 2 fatal errors
Warning: running msgfmt on R-fr.po failed.
Here is the po file:
msgid ""
msgstr ""
"Project-Id-Version: pockage 0.0.0.9000\n"
"POT-Creation-Date: 2023-10-06 10:45+0200\n"
"PO-Revision-Date: 2023-10-06 10:33+0200\n"
"Last-Translator: Malle Salmon\n"
"Language-Team: none\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=ASCII\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
#: mensaje.R:9
msgid "user"
msgstr "utilisateur·rice"
#: mensaje.R:10
msgid "Hello {name}!"
msgstr "Salut {name} !"
This is on:
─ Session info ─────────────────────────────────────────────────────────────────
setting value
version R version 4.2.0 (2022-04-22)
os Ubuntu 20.04.6 LTS
system x86_64, linux-gnu
ui RStudio
language en_US.utf8
collate en_US.utf8
ctype en_US.utf8
tz Europe/Paris
date 2023-10-06
rstudio 2023.06.2+561 Mountain Hydrangea (desktop)
pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
I installed potools from GitHub with pak, and didn't have to worry about the system dependency (or maybe I should!).
Apparently I also get the error for the slash in the other file https://github.com/maelle/pockage/blob/a36978a1c06dcdc3dbd6200f4110c2bbaa1ba21b/po/R-ca.po#L15 but that wasn't breaking on its own.
The main concern for platform is if this is coming from Windows or not. Definitely surprised this is happening on Ubuntu and hadn't been caught yet! I'll take a look at this soon.
I know literally nothing about this, but this line caught my eye:
"Content-Type: text/plain; charset=ASCII\n"
Would be worth trying chaning ASCII to UTF-8.
@hadley yes, this worked! :tada:
Thanks @hadley!
Maëlle, can I know how that .po
file was generated in the first place? Want to make sure {potools} is not emitting any troublesome headers like that.
Looks like {potools} can do so, here's how run_msginit()
would work:
msginit -i R-pockage.pot -o R-ja.po -l ja -w 120 --no-translator
grep charset R-ja.po
# "Content-Type: text/plain; charset=ASCII\n"
I don't see an option for msginit
to force it to use charset=UTF-8
, looks like it's entirely derived from the header metadata in the .pot file:
‘MIME-Version, Content-Type, Content-Transfer-Encoding’
These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale’s encoding.
I had hoped using msginit -l ja.UTF-8 ...
would do the trick but no such luck.
If I replace charset=CHARSET
with charset=UTF-8
in the .pot file, msginit
indeed carries that over to the output .po
file.
Looking now how safe it may be to default to charset=UTF-8
in .pot files...
Another note -- looks like there's some conflict b/w po_create()
which wraps msginit
, vs. write_po_file()
which always sets charset=UTF-8
:
I had created the files using potools. Thank you!
:wave:, thanks for maintaining potools!
I'm writing an example package, and noticed I can't use "¡" in msgid nor msgstr, is that expected?