Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.63k stars 987 forks source link

Multiple small issues with messages in the C code #6503

Open aitap opened 2 months ago

aitap commented 2 months ago

This is a continuation of #6482. I think I won't find any more issues in the C code messages. Some of these are questions, a few indicate real translation hurdles.

The rest are questions unrelated to translations:

(Checkboxes indicate either no action needed or a corresponding fix being suggested in #6504)

tdhock commented 2 months ago

Should this mention the issue tracker instead?

yes

MichaelChirico commented 2 months ago

This one is actually possible for me to translate in parts (so we can keep it as is if needed)

I guess you're referring to few/many, not the jump>0 complete sentence added at the end. If so: yes, I think we should branch it.

I think we shouldn't be calling it ASCII if its / is not immediately before 0. How about "The character '/' is not just before '0' in the source character set

Hmm, good point, though I worry "source character set" is a highly technical term --> relatively tough to translate. WDYT about "Unlike the very common case, e.g. ASCII, the character '/' is not just before '0'".

Would it be more helpful to mention character strings in the error message too?

Ping @jangorecki who has the best context here

Is LENGTH(.) < 0 something that's possible in R ≥ 3.3.0?

Good spot. Has it ever been possible? My best guess was this was a somewhat half-baked fix here:

https://github.com/Rdatatable/data.table/commit/437dc39290d9010847bac8bee1000f94d01161a1

Author fixed the length=0 case to work and changed the message to adapt without stopping to think "length<0 is not possible" instead of "how do I adapt x<=0 | x>0 to include x=0 valid? x<0 | x>=0".

So I think we can just drop that condition and the corresponding part of the message.

aitap commented 2 months ago

I guess you're referring to few/many, not the jump>0 complete sentence added at the end. If so: yes, I think we should branch it.

Yes, I mean the part about few/many. Could you please clarify what you meant by "branch it"?

I worry "source character set" is a highly technical term --> relatively tough to translate. WDYT about "Unlike the very common case, e.g. ASCII, the character '/' is not just before '0'".

Agreed, this phrasing sounds fine.

My best guess was this was a somewhat half-baked fix here:

437dc39

...which started out as a check for "opposite of length() > 0". Will remove the condition and adjust the message.

MichaelChirico commented 2 months ago

Could you please clarify what you meant by "branch it"?

thisNcol < ncol ? _("A line with too few fields...") : _("A line with too many fields...)"

jangorecki commented 2 months ago

No idea really, but I can imagine some hacks in setting up length to negative value, that could have been in play.

rikivillalba commented 2 months ago

some contribs

MichaelChirico commented 2 months ago

[aside/FYI @rikivillalba if your permalink includes column numbers (C10, C23 in your 2nd bullet before my edit), it won't render inline in the issue. I don't know why GitHub sometimes includes the column numbers]

aitap commented 2 months ago

@jangorecki I'm not seeing calls to SETLENGTH in fmelt.c, but I do know that data.table uses negative TRUELENGTH for its internal purposes. Hmm.

@rikivillalba Thank you for reminding me! There's quite a lot of debugging printout that consists of argument/variable names that (arguably) should not be translated:

Which of these should have their _ removed? Which should be translated anyway?

MichaelChirico commented 2 months ago

for those, I think we need a system for marking "notranslate" at the same time, so that CI/release doesn't keep finding these strings that are in translateable calls, just not useful to translate.

In potools, that's // # notranslate.

Let's open another separate issue for those, it's a bit different from the topic at hand.

aitap commented 2 months ago

Edit: hmm, I don't see "at RHS position" at all in the official *.pot file. Nevertheless, I don't think xgettext handles compile-time string concatenation in general.

MichaelChirico commented 2 months ago

Re: the previous comment, any suggested fix? The first things that come to mind seem messy. Maybe we should functionalize the macro...

aitap commented 2 months ago

The core of the problem is that even if we do follow the recommendation to format the number into a temporary buffer and then use %s for all number formats, we still have a combinatorial explosion of 7 (TO) ⋅ 2 (target vector / column %d named '%s') strings that xgettext wants spelled out in the source code.

Can we cheat a little and split the clauses into two sentences? Then we'll have

_("%s (type '%s') at RHS position %d taken as TRUE")
_("%s (type '%s') at RHS position %d taken as 0")
_("%s (type '%s') at RHS position %d either truncated (precision lost) or taken as 0")
_("%s (type '%s') at RHS position %d out-of-range (NA)")
_("%s (type '%s') at RHS position %d out-of-range (NA) or truncated (precision lost)")
_("%s (type '%s') at RHS position %d had either imaginary part discarded or real part truncated (precision lost)")
_("%s (type '%s') at RHS position %d had imaginary part discarded")

and

_("Problem when assigning to type '%s' (target vector):")
_("Problem when assigning to type '%s' (column %d named '%s'):")

...which is more manageable. The strings could be rephrased further, making them truly separate sentences (The problem happened when assigning to type '%s' (target vector).).

MichaelChirico commented 2 months ago

Would it be more helpful to mention character strings in the error message too?

cc @jangorecki. I'm not sure what change you have in mind exactly, but I think the call-out to consult ?substitute2 is the most helpful part. I worry adding to the message will make it over-complicated.

MichaelChirico commented 2 months ago

part of above message, perhaps difficult to translate out of context.

Good spot, there's a similar fragment below. It's a bit hard to pull apart exactly how we'd make this friendlier. I think the best bet is to make each fragment a standalone sentence. I'll want to read more carefully how exactly those show up in the verbose output. Filing as a separate issue so it's not lost, there's already a lot going on here.