Open aitap opened 1 month ago
Should this mention the issue tracker instead?
yes
This one is actually possible for me to translate in parts (so we can keep it as is if needed)
I guess you're referring to few
/many
, not the jump>0
complete sentence added at the end. If so: yes, I think we should branch it.
I think we shouldn't be calling it ASCII if its / is not immediately before 0. How about "The character '/' is not just before '0' in the source character set
Hmm, good point, though I worry "source character set" is a highly technical term --> relatively tough to translate. WDYT about "Unlike the very common case, e.g. ASCII, the character '/' is not just before '0'".
Would it be more helpful to mention character strings in the error message too?
Ping @jangorecki who has the best context here
Is LENGTH(.) < 0 something that's possible in R ≥ 3.3.0?
Good spot. Has it ever been possible? My best guess was this was a somewhat half-baked fix here:
https://github.com/Rdatatable/data.table/commit/437dc39290d9010847bac8bee1000f94d01161a1
Author fixed the length=0
case to work and changed the message to adapt without stopping to think "length<0
is not possible" instead of "how do I adapt x<=0 | x>0
to include x=0
valid? x<0 | x>=0
".
So I think we can just drop that condition and the corresponding part of the message.
I guess you're referring to
few
/many
, not thejump>0
complete sentence added at the end. If so: yes, I think we should branch it.
Yes, I mean the part about few
/many
. Could you please clarify what you meant by "branch it"?
I worry "source character set" is a highly technical term --> relatively tough to translate. WDYT about "Unlike the very common case, e.g. ASCII, the character '/' is not just before '0'".
Agreed, this phrasing sounds fine.
My best guess was this was a somewhat half-baked fix here:
...which started out as a check for "opposite of length() > 0
". Will remove the condition and adjust the message.
Could you please clarify what you meant by "branch it"?
thisNcol < ncol ? _("A line with too few fields...") : _("A line with too many fields...)"
No idea really, but I can imagine some hacks in setting up length to negative value, that could have been in play.
some contribs
STOP(_("Internal error in %s: %s. Please report to the data.table issues tracker", __func__, internalErr);
https://github.com/Rdatatable/data.table/blob/3b376dab6092c67dead7197c858f450109cb519b/src/fread.c#L2627[aside/FYI @rikivillalba if your permalink includes column numbers (C10
, C23
in your 2nd bullet before my edit), it won't render inline in the issue. I don't know why GitHub sometimes includes the column numbers]
@jangorecki I'm not seeing calls to SETLENGTH
in fmelt.c
, but I do know that data.table
uses negative TRUELENGTH
for its internal purposes. Hmm.
@rikivillalba Thank you for reminding me! There's quite a lot of debugging printout that consists of argument/variable names that (arguably) should not be translated:
jump0size==0
) in the middle:
https://github.com/Rdatatable/data.table/blob/d7e95d1fc1e102d3fd0c0cb448892e67116ac03a/src/fread.c#L1891-L1896mygetenv
, which may be not that important:
https://github.com/Rdatatable/data.table/blob/d7e95d1fc1e102d3fd0c0cb448892e67116ac03a/src/openmp-utils.c#L89-L97Which of these should have their _
removed? Which should be translated anyway?
for those, I think we need a system for marking "notranslate" at the same time, so that CI/release doesn't keep finding these strings that are in translateable calls, just not useful to translate.
In potools, that's // # notranslate
.
Let's open another separate issue for those, it's a bit different from the topic at hand.
xgettext
: the string eventually given to gettext()
will not be translated because xgettext
had split it into parts, including "%"
:
https://github.com/Rdatatable/data.table/blob/d7e95d1fc1e102d3fd0c0cb448892e67116ac03a/src/assign.c#L944-L945Edit: hmm, I don't see "at RHS position" at all in the official *.pot
file. Nevertheless, I don't think xgettext
handles compile-time string concatenation in general.
Re: the previous comment, any suggested fix? The first things that come to mind seem messy. Maybe we should functionalize the macro...
The core of the problem is that even if we do follow the recommendation to format the number into a temporary buffer and then use %s
for all number formats, we still have a combinatorial explosion of 7 (TO
) ⋅ 2 (target vector
/ column %d named '%s'
) strings that xgettext
wants spelled out in the source code.
Can we cheat a little and split the clauses into two sentences? Then we'll have
_("%s (type '%s') at RHS position %d taken as TRUE")
_("%s (type '%s') at RHS position %d taken as 0")
_("%s (type '%s') at RHS position %d either truncated (precision lost) or taken as 0")
_("%s (type '%s') at RHS position %d out-of-range (NA)")
_("%s (type '%s') at RHS position %d out-of-range (NA) or truncated (precision lost)")
_("%s (type '%s') at RHS position %d had either imaginary part discarded or real part truncated (precision lost)")
_("%s (type '%s') at RHS position %d had imaginary part discarded")
and
_("Problem when assigning to type '%s' (target vector):")
_("Problem when assigning to type '%s' (column %d named '%s'):")
...which is more manageable. The strings could be rephrased further, making them truly separate sentences (The problem happened when assigning to type '%s' (target vector).
).
Would it be more helpful to mention character strings in the error message too?
cc @jangorecki. I'm not sure what change you have in mind exactly, but I think the call-out to consult ?substitute2
is the most helpful part. I worry adding to the message will make it over-complicated.
part of above message, perhaps difficult to translate out of context.
Good spot, there's a similar fragment below. It's a bit hard to pull apart exactly how we'd make this friendlier. I think the best bet is to make each fragment a standalone sentence. I'll want to read more carefully how exactly those show up in the verbose output. Filing as a separate issue so it's not lost, there's already a lot going on here.
This is a continuation of #6482. I think I won't find any more issues in the C code messages. Some of these are questions, a few indicate real translation hurdles.
pgettext
, which seems to be currently unavailable even to C code in R's built-in copy ofgettext
on Windows._("no")
not to mean something else in a different part of the code: https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/fread.c#L2038 Unfortunately, expanding this sentence will duplicate the full line of the code.The rest are questions unrelated to translations:
/
is not immediately before0
. How about "The character '/' is not just before '0' in the source character set"? https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/init.c#L225 https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/init.c#L227.Last.updated
: https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/init.c#L367data.table:::list2lang
converts character strings to symbols. Would it be more helpful to mention character strings in the error message too? https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/programming.c#L16LENGTH(.) < 0
something that's possible in R ≥ 3.3.0? https://github.com/Rdatatable/data.table/blob/944340901d5b37cfdac43c9cc5b8695c5c12f5a0/src/fmelt.c#L66(Checkboxes indicate either no action needed or a corresponding fix being suggested in #6504)