Open damianmoz opened 5 months ago
This routine is normally protected by a wrapper which ensures things like the base b is either 10 or 16. As a I noted, ths is minimalist to facilitate bug detection. Thanks
OUTPUT FOLLOWS:
Correct (no ternary operator)
+|1|2|3|4|5|6|7 -|1|2|3|4|5|6|7 +|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- MAXINT -|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|8 +|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- +MAXINT -|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- -MAXINT
Rubbish (using ternary operator)
+|1|2|3|4|5|6|7 -|1|2|3|4|5|6|7 9|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- MAXINT -|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|0|0|8 2|2|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- +MAXINT - not quite!! -|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|0|0|0|7 <--- -MAXINT - not quite!!
Without '|' ->
Correct:
+1234567 -1234567 +9223372036854775807 <--- MAXINT -9223372036854775808 +9223372036854775807 <--- +MAXINT -9223372036854775807 <--- -MAXINT
Rubbish:
+1234567 -1234567 99223372036854775807 <--- MAXINT -9223372036854775008 22223372036854775807 <--- +MAXINT - not quite!!
@damianmoz a standard question: how critical is this bug in your work / is the workaround you proposed sufficient for you?
It is not stopping me here because I debugged it. It was not fun and consumed way much time. And got lucky. What f I had not got lucky.
I use the ternary if operator all the time. And I mean all the time.
I am not looking forward to dong such debugging. t means you need to have a huge warning in the documentation that says SUSPECT. That it not good for the language. Scary too.
Heres a simpler reproducer
proc positive( z : string)
{
param EMPTY = '';
return if z == '~' then ' ' else (if z == '+' then z else EMPTY);
}
const s = positive('+');
writeln(s);
The problem is that z
is a reference, and the expression is creating an owning copy for the return value, which is then de-allocated when the function exits.
I can rewrite the return in subtle ways to get this to work again.
// 1
return if z == '~' then ' ' else (if z == '+' then z:string else EMPTY);
// 2
return if z == '~' then ' ' else (if z == '+' then '+' else EMPTY);
// 3
const temp = if z == '+' then z else EMPTY;
return if z == '~' then ' ' else temp;
Thanks Jade. That is good to know. Still scary
To date, i.e. for the last decade, I have only used the ternary if with numbers. Which is why I have not seen this before. I have hardly ever used strings in the past.
I still think it needs to get fixed but I do not control your priorities.
So strings are broken when used with a multi-level ternary if
Strings seem a bit fragile.
Jade, thanks for looking into this. Both //1 and //2 are not intuitive.
I did not try your //3 yesterday. But I did just now and it works flawlessly. Basically, these results say that the compiler can only handle a trivial ternary if, i.e. one without a nesting of if/then/else. This should be noted in the Chapel documentation - in very big bold letters. Or for people (like me), have a warning issued by the compiler (which cannot be suppressed).
So, to answer your question @vasslitvinov, Jade's solution is a bit clunky but that will do me for now. So, fix at your leisure. But I would suggest that the warning message is super urgent. That's dangerous.
I absolutely love ternary if statements. I fell in love with them in Algol68. I am from the "every statement has a value" camp. They are much more succinct than the equivalent with a normal if/then/else. In C or C+,+ that succinctness also results in far more optimal code! I looked at numerous versions of my smaller C (and Chapel) functions which approximate the function by different rules depending on the domain of the argument (mix). I note that >80% of them have ternary if statements which are 3 deep, and >30% are 4 deep. I even have some which are 6 deep.
This is the first time I have tried a ternary if with a string which is anything except 2 deep.
Thanks Jade, my stress levels have dropped significantly today since yesterday.
It is not just strings with a ternary if that will have a problem. I guess that it will be anything which is creating a copy and owning it and all that stuff. Out of scope for my brain.
Should I close this now?
We should keep the issue open until the bug is fixed, or at the very least, the compiler is adjusted to emit a warning or error for this pattern.
Should I be using c_array(T, n) where T is some type to mimic a char? But that what type is T and how do I then write a single char. Then at the end, I would want to return a genuine Chapel string? How do I create a string but I cannot find that mentioned in the doco, although I could be fishing with the wrong phrases so not looking in the right places? Not urgent.
I have had (not string) scenarios where I had tried to pass a sliced array expression as a parameter to a routine whose matching argument was a const. I got a nice compiler message saying to the effect of "cannot handle this". So I had to stick the expression in question into a const temporary. Only added one extra line. Still very readable. That worked well. But the fatal error message from the compiler told me that there was something it could not handle. So, that message should be all that is needed.
In some ways forcing the user to create their own temporary is good because they then know what space needs they are demanding in a given scenario. That said, in my case it is a 32 byte buffer so it is peanuts.
Should I be using c_array(T, n) where T is some type to mimic a char? But that what type is T and how do I then write a single char.
You can certainly try using a c_array
where T is c_char
or c_uchar
or something along those lines. But in an ideal world, you shouldn't have to, this should be something that "just works".
Then at the end, I would want to return a genuine Chapel string? How do I create a string but I cannot find that mentioned in the doco, although I could be fishing with the wrong phrases so not looking in the right places? Not urgent.
I think what you're looking for is string.createAdoptingBuffer
and string.createCopyingBuffer
from https://chapel-lang.org/docs/latest/language/spec/strings.html#String.string.createAdoptingBuffer. The documentation for c_array
says that it can coerce to a c_ptr
with the same element type, though I'm not finding that true in practice - it looks like an explicit cast is necessary at the moment to send a c_array to those functions (I'm planning on reporting that in a separate issue)
OK. c_array() sounds a little bit hairy.
I did write a new routine with strings yesterday and was ultra careful with my ternary operators and worked with copies and avoided referencing the original string as a whole. And no stack corruption. Woo hoo.
Moving the discussion of "how to convert an int to a string" back to the discourse thread, I posted a version that uses c_array
: https://chapel.discourse.group/t/encoding-an-integer-into-a-string-in-pure-chapel/33326/10
In investigating this, I think the problem has to do with our callDestructors pass. It looks like we insert a destruction of one of the temporaries used to populate the return variable because variableToExclude
doesn't know that it's used for the return variable. It seems to assume that we'll be checking backwards from the correct statement to determine this, but we start looking earlier in the body than it would need to. Additionally, I'm not convinced that the check appropriately responds to if statements when searching.
Unfortunately, it seems like we rely on some of this behavior, as starting from the very end of the body results in problems for a basic hello world program. But in talking to Michael, I have a couple of paths I can try to get things working until the dyno compiler replacement for this pass is available - we do think the pass will be able to be much better written in the dyno world, which will hopefully avoid the discovery of similar errors in the future
I just merged #25384, which will make the situation generally better, but doesn't fully resolve all the problems with ternary expressions and strings. I've opened #25417 to track the remaining issues, which the dyno effort will hopefully resolve when it gets to that compilation pass. Closing this issue.
The PR ended up introducing some memory leaks in other places that I wasn't expecting, so I'm undo'ing it for now and re-opening this issue. I did a quick look into how to add a warning/error instead, but that also had much broader impact than I'm comfortable with.
I encountered this bug today while writing a ternary operation as such:
var commFileIdentifier = if measureComms && isRandom then identifier + "_" + scale:string
else if measureComms then identifier else "";
This ended up giving me weird behavior where commFileIdentifier
was being modified by other code that did not use it at all. Further, when I tried to make it a const
the weird behavior was still there. I ended up fixing it by doing the following:
var commFileIdentifier:string;
if measureComms && isRandom then commFileIdentifier = identifier + "_" + scale:string;
else if measureComms then commFileIdentifier = identifier;
else commFileIdentifier = "";
This behavior should definitely have a warning or error attached to it until it is properly fixed. I spent a good while debugging it, and message along the lines of "ternary operations with strings cause memory corruption" would've helped a ton.
My guess as to why @alvaradoo's variable gets remotely modified, copied from Slack:
Seems like the if-expr temp deallocates the internal string memory, which gets re-allocated in the helper routine, and written over. So it looks like an unrelated string literal got remotely modified, but it’s because its internal memory has been given to someone else.
Unfortunately, based on my analysis, I think an error message that didn't also give false positives would require a similar amount of work to resolving the problem in the first place :(
Maybe the false positives are worth it for the near term?
Maybe. Depends on if we think "don't use ternary expressions with strings" is more embarrassing and detrimental overall than rewriting the pass now.
My problem only happened when the ternary operator was nested. A trivial
return if <conditional> then <value> else <value>
seems to work (well for me)
The following routine, without the ternary operator used in the return statement, works.
If the BROKEN return statement is enabled (and the earlier code disabled), it must do naughty things to the stack because subsequent assignment to a freshly allocated array of strings gets corrupted. Or am I using strings the wrong way?
Sample file (suitably trimmed of excess functionality) appears here:
Tagging @mppf and @vasslitvinov. Thanks.
The above code is supposed to encode an integer in base 10 or base 16. The functionality removed for purposes here is related to handling grouping separators, e.g. a comma every three digits or an underscore every two digits in a hexadecimal number.