Closed robdockins closed 2 years ago
clang
can be convinced to halt compilation on these kinds of problems by passing the -Wconversion -Werror
flags.
I've also just learned about tgmath.h
, which provides type-generic macros for math.h
functions. It seems it first appeared in C99. This might make it slightly easier, at least for the floating-point functions.
Yes, we have recently talked about this (Alwyn, Frank and I).
This is what I meant today when I said that the meaning of some math operations will change.
I've been looking into this issue. My solution spelled out the types a bit more in the case selections for OP{1,2}, in an attempt at preventing future developers from defaulting to a "f" suffix by mistake for functions that may not support it (i.e., it directed the developer to "spell it out").
I've tried to merge your solution and mine, into a simpler, shared version available here: https://github.com/Copilot-Language/copilot/tree/develop-specialize-math-types. I removed some checks on errors on input types, mainly because the GADT would disallow them by construction.
Still, it seems like there's "too much logic" and too much ad-hocness. (The whole C99-backend can be vastly simplified.)
I also went down the path of tgmath.h
, and I agree with Rob: if it's well supported, it's likely the simplest and best solution. I know arduino is not our main target, but I'd definitely want to hear @joeyh's thoughts on this.
I just tried compiling an arduino example with gcc-avr it does not find a <tgmath.h>
. It does not seem to be part of the standard avr-libc anymore.
Note that the variants specialized for floats in math.h
in avr are just aliases for the double
-based versions of the same functions, so there's just no difference. In that platform, it makes no sense to specialize for different types (except for abs
/fabs
).
I've sent a message to the avr-libc users mailing list to see if there's a reason for that file not being included.
@joeyh for arduino, I found that the following compiles:
Adaptation to Makefile, to include user library:
USER_LIB_PATH := $(realpath libraries)
BOARD_TAG = uno
include /usr/share/arduino/Arduino.mk
New libraries/tgmath/tgmath.h
file, which simply imports math.h
:
#include <math.h>
Would this be an acceptable adaptation to arduino-copilot
?
The following question is for everyone:
If the above works with arduino-copilot, are there any reasons not to use tgmath.h
? This solution is much cleaner, I think it requires very few changes (see last 3 commits in https://github.com/Copilot-Language/copilot/tree/develop-specialize-math-types-tgmath, or compare branches).
Btw, tgmath.h
does not provide a macro for abs
, so we'd still need to distinguish between (integer) abs
and fabs
.
Ok, after some deliberation with the Copilot team, discussions on the avr mailing list, etc. I'm making an executive decision to merge the change as implemented in the branch above (that is, without tgmath.h
). The reason is that tgmath.h
relies on specific compiler extensions, its implementation is compiler- (and architecture-) dependent, all of which makes it harder for people to reason about specifically which version of the function will be executed (they know in principle, but tgmath.h
hides details away to a point where V&V might get harder).
It's very borderline though, and I could be convinced of the opposite, especially if we can make a good argument in favor of verification. And, more generally, I would be interested in seeing if it's possible to optimize that boilerplate code away in some other clean way.
Just to clarify, when you say "the branch above", you're referring to https://github.com/Copilot-Language/copilot/tree/develop-specialize-math-types?
I am.
Clarification: @robdockins is right that abs, labs, etc. may have to be used for larger integers, but they operate on int, not fixed-sized ints. I'm looking into that.
I think a reasonable implementation would be to use abs
for 16-bit-sized ints/word and below, casting fro and back, and labs
for 32-bit sized ints and above, but I'll check that the results are correct.
We could avoid casts by expanding Abs
to something like:
(x + (x >> (8*sizeof(int64_t) - 1))) ^ (x >> (8*sizeof(int64_t) - 1));
replacing int64_t
with whatever type Abs
should be applied to.
Probably too late to matter, but: I could make arduino-copilot deal with tgmath being used, I'm sure. Modifying the Makefile would not be my preferred option, for one it's intrusive and also arduino-copilot users may not only use the Makefile but may opt to load the generated C code into the Arduino IDE and flash it that way.
Also I checked Zephyr and it seems to include tgmath.h, although I have not verified it works there.
It's very borderline though, and I could be convinced of the opposite, especially if we can make a good argument in favor of verification. And, more generally, I would be interested in seeing if it's possible to optimize that boilerplate code away in some other clean way.
Regarding the style of verification we are doing: I don't think it makes much difference wether copilot-c99 selects the correct operations, or the C compiler does via tgmath
macros. We receive the code to verify as LLVM intermediate representation, so how that choice gets made is not very relevant.
Regarding abs
, I think it might be nicer to do something a little less bit-fiddly. One fairly nice option, I think, would be to define a few static inline functions that do the necessary. E.g.:
static inline int32_t iabs32 ( int32_t x ) {
return (x < 0 ? (-x) : x);
}
These should compile away nicely while still giving human reviewers something reasonable to look at.
We receive the code to verify as LLVM intermediate representation, so how that choice gets made is not very relevant.
Right. I was also thinking about other possible verification efforts not just based on the bytecode, but the code generated at the moment is not for "human consumption" in general, so perhaps it's good to lean on the side of simplicity (that is, use tgmath.h
) and change it in the future if needed.
Looking at the bytecode, it seems that what LLVM generates for tgmath.h
is fairly benign. For example, for a simple function that uses fabs
:
#include <stdint.h>
#include <tgmath.h>
float my_abs (float arg) {
return fabs(arg);
}
LLVM generates:
define dso_local float @my_abs(float %0) #0 {
%2 = alloca float, align 4
%3 = alloca float, align 4
store float %0, float* %3, align 4
%4 = load float, float* %3, align 4
store float %4, float* %2, align 4
%5 = load float, float* %2, align 4
%6 = call float @llvm.fabs.f32(float %5) #2
ret float %6
}
; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.fabs.f32(float) #1
which is fairly close, and not much more complex, than what we get if we make the decision of which fabs
variant from math.h
to use:
define dso_local float @my_abs(float %0) #0 {
%2 = alloca float, align 4
store float %0, float* %2, align 4
%3 = load float, float* %2, align 4
%4 = call float @llvm.fabs.f32(float %3)
ret float %4
}
It seems that the tgmath
version passes data around more. I'm hopeful, in terms of verification, that it makes no difference.
Regarding
abs
, I think it might be nicer to do something a little less bit-fiddly. One fairly nice option, I think, would be to define a few static inline functions that do the necessary. E.g.:static inline int32_t iabs32 ( int32_t x ) { return (x < 0 ? (-x) : x); }
These should compile away nicely while still giving human reviewers something reasonable to look at.
The decision here is a bit related to what happened with sign
. We decided to just generate the inlined expression ourselves, as opposed to creating a function. Whatever we chose, it would be good if possible to be reasonably consistent.
In terms of verification, I do have a question for you though. The version with shift is straightforward, and has no conditionals:
; Function Attrs: noinline nounwind optnone uwtable
define internal i32 @iabs32(i32 %0) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = load i32, i32* %2, align 4
%5 = ashr i32 %4, 31
%6 = add nsw i32 %3, %5
%7 = load i32, i32* %2, align 4
%8 = ashr i32 %7, 31
%9 = xor i32 %6, %8
ret i32 %9
}
However, a version like the one you (@robdockins) just wrote, with a conditional, generates bytecode with branching:
; Function Attrs: noinline nounwind optnone uwtable
define internal i32 @iabs32(i32 %0) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = icmp slt i32 %3, 0
br i1 %4, label %5, label %8
5: ; preds = %1
%6 = load i32, i32* %2, align 4
%7 = sub nsw i32 0, %6
br label %10
8: ; preds = %1
%9 = load i32, i32* %2, align 4
br label %10
10: ; preds = %8, %5
%11 = phi i32 [ %7, %5 ], [ %9, %8 ]
ret i32 %11
}
Does it make a meaningful difference in terms of verification, especially if we have a very large spec to verify?
Note that tgmath.h
defines no abs
, only fabs
, so, contrary to what I originally thought, that one is still on us and the branch I created with a tgmath
-based version is incomplete. We'd need to deal with abs
for integers in Copilot.
Regarding inline code generation versus auxiliary functions: agreed, consistency is good. Generating extra functions is a bit more involved, but probably worth the effort, in my opinion. Reasonable people could differ here.
Regarding tgmath.h
vs math.h
: looks like the macro in tgmath
spills into an additional local variable. That should be no problem for the verifier. I expect this sort of thing would probably boil entirely away on -O1
and higher.
Regarding branching in abs
: crucible
is quite good a merging this sort of control-flow branching, so I don't expect that to be much of a problem for us. Other verification technologies that don't do path merging might have more trouble. FWIW, on -O1
and higher, I expect this would be turned into a select
instead, and compiled into a conditional move instruction.
I would think it is more likely that we would have a potential problem with the SMT solver spending a fair amount of resources to notice that the bit twiddly version computes the same thing as the version based on conditionals (which I think is how the reference semantics are stated). I suspect it will have to bit-blast the expression to prove this equivalence, which can get pretty expensive for longer bit-widths. Of course, the proof is in the pudding. We could run some experiments.
We could run some experiments.
If it takes more than 1/2h-1hr to check, it's probably a better use of time to default to the simplest expression possible and come back to it if it causes a problem.
Just my quick view on the problem:
The general idea of Copilot is to be platform independent, but it is a known issue (since rewriting the C99 backend) that this is not really feasible with the current implementation. As pointed out in this issue, we will reach the point where we need to know a bit more about the platform.
One nice solution would be to have Copilot load a sort of 'platform configuration' that tells what the default sizes for numeric types are. While generating the C code we could then for example call the right function that belongs to i.e. an int16_t
. This would require writing such a configuration for every target platform, but I don't that should be a big problem.
Myself I don't have experience using tgmath.h
, but I am pretty sure there are some compiler and platform specifics that could cause trouble. It seems that these generic math functions aren't used that widely for that reason.
We do need to close this issue. Any further comments, details, evidence?
For my part, I think the solution in https://github.com/Copilot-Language/copilot/tree/develop-specialize-math-types is a fine way to go. I did some very quick tests in Cryptol, and solvers did not seem to have much trouble proving the shift/xor version of abs equal to one based on conditionals and negation, so that concern is probably not something we need to worry about.
Great to know! Thanks for the quick reply.
Description
The translation into C99 of specs with math operations, for most kinds of operations, does not generate code that uses type information to pick the function to be used. The use of functions that operate on types other than the ones the streams carry results in unintended, implicit type conversions (casts)
For example, to calculate the sine of a value in a stream we use the C function sin
, even though a more specialized version for floats (sinf
) exists. In Copilot, we could write:
import Language.Copilot
import Copilot.Language
import Copilot.Language.Prelude
import Prelude ()
import Copilot.Compile.C99
ext :: Stream Float
ext = extern "input_stream" Nothing
sined :: Stream Float
sined = sin ext
spec = do
trigger "handler" true [ arg sined ]
main = do
r <- reify spec
compile "spec" r
The code generated by Copilot for such spec uses sin
, which performs an unexpected conversion from float to double, which is expected not to have any loss (6.3.1.5 Real floating types, "When a float is promoted to double [...] its value is unchanged"), calculates the sine of the double number, and the casts it back to float:
static float input_stream_cpy;
float handler_arg0(void) {
return (sin)((input_stream_cpy));
}
The result is not the same as if we use sinf
on floats, as can be checked with the following test C program:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "spec.h"
float input_stream = 0;
int err_code = 0;
void handler (float arg) {
float f = sinf(input_stream);
if (f != arg) {
float diff = f - arg;
printf("Error: %.8f\t%.8f\t%.8f\t%.8f\n", input_stream, f, arg, diff);
err_code = 1;
}
}
int main()
{
int i;
for (i = 0; i < 1000; i++) {
input_stream = (float)rand() / RAND_MAX;
step();
}
return err_code;
}
When compiled and executed, that program presents a number of counterexamples:
$ gcc main.c spec.c -o demo -lm
$ ./demo
Error: 0.49358299 0.47378424 0.47378427 -0.00000003
Error: 0.28429341 0.28047931 0.28047928 0.00000003
Error: 0.35036018 0.34323615 0.34323612 0.00000003
Error: 0.09348048 0.09334439 0.09334438 0.00000001
Error: 0.75729382 0.68695742 0.68695736 0.00000006
Error: 0.24804412 0.24550842 0.24550840 0.00000001
Error: 0.20221278 0.20083752 0.20083751 0.00000001
Error: 0.46366215 0.44722658 0.44722661 -0.00000003
Error: 0.44910511 0.43415955 0.43415958 -0.00000003
Error: 0.21088292 0.20932335 0.20932333 0.00000001
Error: 0.30540800 0.30068234 0.30068231 0.00000003
Error: 0.18761641 0.18651767 0.18651766 0.00000001
Error: 0.50346100 0.48245996 0.48245999 -0.00000003
Error: 0.51034904 0.48848182 0.48848185 -0.00000003
Error: 0.47755122 0.45960572 0.45960575 -0.00000003
Error: 0.12232582 0.12202097 0.12202097 0.00000001
Error: 0.46184778 0.44560304 0.44560307 -0.00000003
Error: 0.33289206 0.32677770 0.32677767 0.00000003
Error: 0.50240129 0.48153147 0.48153150 -0.00000003
Error: 0.59964442 0.56434894 0.56434900 -0.00000006
In general, it is best that we avoid implicit casts, especially when they lead to different results.
The same is also true for the functions recip
, which assumes its value to be a double, and abs
, which uses the C function abs
regardless of whether the argument is an integer, a float
, or a double
, when different functions exist for each case. The case is more extreme with abs
, which operates on integers and therefore discards the decimals when applied to float
or double
.
Due to the way that floats spread over the number range, the differences between the "cast" and "non-cast" results are expected to be much larger than the values presented in the last column of the output above.
Type
Additional context
None
Requester
Method to check presence of bug
The following spec performs a number of operations on floats:
import Copilot.Compile.C99
import Copilot.Language
import Copilot.Language.Prelude
import Language.Copilot
import Prelude ()
ext :: Stream Float
ext = extern "input_stream" Nothing
streamAbs :: Stream Float
streamAbs = abs ext
streamSin :: Stream Float
streamSin = sin ext
-- Ensure that argument to acosh is higher than 1 so that it is within
-- range.
streamAcosh :: Stream Float
streamAcosh = acosh (1 + ext)
streamExp :: Stream Float
streamExp = exp ext
streamLog :: Stream Float
streamLog = log ext
streamSqrt :: Stream Float
streamSqrt = sqrt ext
spec = do
trigger "handlerAbs" true [ arg streamAbs ]
trigger "handlerSin" true [ arg streamSin ]
trigger "handlerAcosh" true [ arg streamAcosh ]
trigger "handlerExp" true [ arg streamExp ]
trigger "handlerLog" true [ arg streamLog ]
trigger "handlerSqrt" true [ arg streamSqrt ]
main = do
r <- reify spec
compile "spec" r
The following C program compares the results from Copilot what would have been obtained using tgmath.h
, which specializes the functions based on the types:
#include <tgmath.h>
#include <stdio.h>
#include <stdlib.h>
#include "spec.h"
float input_stream = 0;
int err_code = 0;
void assert_eq (char* name, float v1, float v2) {
if (v1 != v2) {
float diff = v1 - v2;
printf("Error (%s):\t%.8f\t%.8f\t%.8f\t%.8f\n",
name,
input_stream,
v1,
v2,
diff);
err_code = 1;
}
}
void handlerSin (float arg) {
float f = sin(input_stream);
assert_eq ("sin", f, arg);
}
void handlerAbs (float arg) {
float f = fabs(input_stream);
assert_eq ("abs", f, arg);
}
void handlerAcosh (float arg) {
float f = acosh(1 + input_stream);
assert_eq ("acosh", f, arg);
}
void handlerSqrt (float arg) {
float f = sqrt(input_stream);
assert_eq ("sqrt", f, arg);
}
void handlerExp (float arg) {
float f = exp(input_stream);
assert_eq ("exp", f, arg);
}
void handlerLog (float arg) {
float f = log(input_stream);
assert_eq ("log", f, arg);
}
int main()
{
int i;
for (i = 0; i < 2000; i++) {
input_stream = (float)rand() / RAND_MAX;
step();
}
return err_code;
}
Expected result
The program above will report a list of values for which there is a difference between applying the function on floats, vs casting to double, applying the function on doubles, and then casting back to float.
The program will fail (return an non-zero exit code) if any differences are found. The following is a selection of some of the values produced:
Error (log): 0.98835063 -0.01171776 -0.01171776 -0.00000000
Error (exp): 0.09492297 1.09957421 1.09957409 0.00000012
Error (sin): 0.18875165 0.18763287 0.18763286 0.00000001
Error (acosh): 0.51963967 0.97978687 0.97978681 0.00000006
Error (abs): 0.19273356 0.19273356 0.00000000 0.19273356
Desired result
The program above should report no differences and return with an exit code of 0.
Proposed solution
Generate code that picks the most specialized version of a function available based on the type of the arguments.
Further notes
The reporter is Ivan Perez because the bug was originally identified internally, and later reported (independently) by Galois.
Change Manager: Confirmed that the bug is present by using method described and examining the result.
Technical Lead: Confirmed that the bug should be addressed.
Technical Lead: Bug scheduled for fixing in Copilot 3.9.
Fix assigned to: @ivanperez-keera .
I am about to send and process the PR of this issue. If there are no more changes, then this will go as described in https://github.com/Copilot-Language/copilot/tree/develop-specialize-math-types.
There have been some recent-ish changes, so please @RyanGlScott take a look. A recent comment with the Change Request description contains more concrete examples that fail if the "wrong" function is picked.
A strange argument in favor of specializing the function used by copilot based on the type of floating-point number, as opposed to using those from tgmath.h
, is that the code is being generated with extra parenthesis; for example:
float handlerExp_arg0(void) {
return (exp)((input_stream_cpy));
}
I did not know that, but that appears to confuse the heck out of the compiler. When using tgmath.h
, if there are extra parenthesis, it seems to pick the function operating on double
, and if the extra parenthesis are removed, then it picks the one operating on float
.
I believe I have been spoiled by decades of Haskell's type inference mechanisms.
It is purely an engineering and economic decision (we might have to change the pretty printer and possible even language-c99-simple
to fix that), but it helps decide between two options that are otherwise similarly good.
I made a mistake when cleaning the commits and left a change behind, but the tests caught the problem. I've just pushed to the same branch.
It's all passing the tests on my side. When you have a chance, please let me know if there's anything else that you think needs changing.
Thanks, @ivanperez-keera! I tried out your branch locally and it behaves like I would expect on all of the code I've been testing. I noticed one minor formatting issue (for which I've left a comment on the relevant commit), but nothing too serious.
A strange argument in favor of specializing the function used by copilot based on the type of floating-point number, as opposed to using those from
tgmath.h
, is that the code is being generated with extra parenthesis
Hah. Funny enough, we've ran into the same issue on our side. Usually, if you invoke the isnan(x)
function, it refers to a macro, but if you surround it with parentheses (e.g., (isnan)(x)
), then Clang resolves isnan
to a specific function over double
! LLVM can be strange like that.
In any case, everything looks good on my end.
Great! Fixed, pushed and waiting for tests / CI.
I mean, it makes sense when you look at the definition (which includes parentheses only around the argument):
# define isnan(x) __builtin_isnan (x)
I don't think this is specific to llvm.
Implementor: solution implemented, review requested.
Change Manager: Verified that:
Solution is implemented:
FROM ubuntu:trusty
RUN apt-get update
RUN apt-get install --yes software-properties-common RUN add-apt-repository ppa:hvr/ghc RUN apt-get update
RUN apt-get install --yes ghc-8.6.5 cabal-install-2.4 RUN apt-get install --yes libz-dev
ENV PATH=/opt/ghc/8.6.5/bin:/opt/cabal/2.4/bin:$PWD/.cabal-sandbox/bin:$PATH
RUN cabal update RUN cabal v1-sandbox init RUN cabal v1-install alex happy RUN apt-get install --yes git
CMD git clone $REPO && cd $NAME && git checkout $COMMIT && cd .. && cabal v1-install copilot/copilot**/ --enable-tests --run-tests
Command:
$ docker run -e "REPO=https://github.com/copilot-language/copilot" -e "NAME=copilot" -e "COMMIT=170f5282fe7d3a5204b187354503cf27cb7e6866" -it copilot-test
- [X] The solution proposed produces the expected result: the function picked is the specialized version depending on the type. The following files run an experiment that compares the version chosen by Copilot with the one picked by `tgmath.h` based on the type:
```Dockerfile
--- Dockerfile-verify-263
FROM ubuntu:trusty
RUN apt-get update
RUN apt-get install --yes software-properties-common
RUN add-apt-repository ppa:hvr/ghc
RUN apt-get update
RUN apt-get install --yes ghc-8.6.5 cabal-install-2.4
RUN apt-get install --yes libz-dev
ENV PATH=/opt/ghc/8.6.5/bin:/opt/cabal/2.4/bin:$PWD/.cabal-sandbox/bin:$PATH
RUN cabal update
RUN cabal v1-sandbox init
RUN cabal v1-install alex happy
RUN apt-get install --yes git
ADD Spec.hs /Spec.hs
ADD main-263.c /main-263.c
# We negate the exit code of the last command, which should fail if the module
# is exposed by the library, but succeed otherwise.
SHELL ["/bin/bash", "-c"]
CMD git clone $REPO && cd $NAME && git checkout $COMMIT && cd .. \
&& cabal v1-install copilot/copilot**/ \
&& cabal v1-exec -- runhaskell Spec.hs \
&& gcc main-263.c spec.c -lm -o main-263 \
&& ./main-263 \
&& echo "Success"
--- Test.hs
import Copilot.Compile.C99
import Copilot.Language
import Copilot.Language.Prelude
import Language.Copilot
import Prelude ()
ext :: Stream Float
ext = extern "input_stream" Nothing
streamAbs :: Stream Float
streamAbs = abs ext
streamSin :: Stream Float
streamSin = sin ext
-- Ensure that argument to acosh is higher than 1 so that it is within
-- range.
streamAcosh :: Stream Float
streamAcosh = acosh (1 + ext)
streamExp :: Stream Float
streamExp = exp ext
streamLog :: Stream Float
streamLog = log ext
streamSqrt :: Stream Float
streamSqrt = sqrt ext
spec = do
trigger "handlerAbs" true [ arg streamAbs ]
trigger "handlerSin" true [ arg streamSin ]
trigger "handlerAcosh" true [ arg streamAcosh ]
trigger "handlerExp" true [ arg streamExp ]
trigger "handlerLog" true [ arg streamLog ]
trigger "handlerSqrt" true [ arg streamSqrt ]
main = do
r <- reify spec
compile "spec" r
--- main-263.c
#include <tgmath.h>
#include <stdio.h>
#include <stdlib.h>
#include "spec.h"
float input_stream = 0;
int err_code = 0;
void assert_eq (char* name, float v1, float v2) {
if (v1 != v2) {
float diff = v1 - v2;
printf("Error (%s):\t%.8f\t%.8f\t%.8f\t%.8f\n",
name,
input_stream,
v1,
v2,
diff);
err_code = 1;
}
}
void handlerSin (float arg) {
float f = sin(input_stream);
assert_eq ("sin", f, arg);
}
void handlerAbs (float arg) {
float f = fabs(input_stream);
assert_eq ("abs", f, arg);
}
void handlerAcosh (float arg) {
float f = acosh(1 + input_stream);
assert_eq ("acosh", f, arg);
}
void handlerSqrt (float arg) {
float f = sqrt(input_stream);
assert_eq ("sqrt", f, arg);
}
void handlerExp (float arg) {
float f = exp(input_stream);
assert_eq ("exp", f, arg);
}
void handlerLog (float arg) {
float f = log(input_stream);
assert_eq ("log", f, arg);
}
int main()
{
int i;
for (i = 0; i < 2000; i++) {
input_stream = (float)rand() / RAND_MAX;
step();
}
return err_code;
}
Command:
$ docker run -e "REPO=https://github.com/copilot-language/copilot" -e "NAME=copilot" -e "COMMIT=170f5282fe7d3a5204b187354503cf27cb7e6866" -it copilot-verify-263
Change Manager: Implementation ready to be merged.
Hah. Funny enough, we've ran into the same issue on our side. Usually, if you invoke the
isnan(x)
function, it refers to a macro, but if you surround it with parentheses (e.g.,(isnan)(x)
), then Clang resolvesisnan
to a specific function overdouble
! LLVM can be strange like that.
Just wanted to chime in here that this is a general C-ism, not just for LLVM; surrounding a function name in parens explicitly disables macro expansion, so having the C99 pretty-printer do that isn't totally benign. Some C standard library functions are intended to be used as macros, and disabling macro-expansion may not work as expected.
Thanks, @robdockins . Good point!
That's an interesting challenge from the point of view of creating a C99 AST and pretty printer. Normal algorithms to minimize the number of parenthesis might not work well, and, potentially, the AST should provide a way to say "between parentheses" even though ASTs would normally not do that because that's more concrete syntax.
I'll discuss it with Frank, who maintains the c99 library we are using atm.
If I were designing a solution to this issue, I'd probably have two function call forms in the AST: one for "named" function calls (which take an identifier and a list of arguments, and the identifier is never parenthised, and might be a macro) and one for "expression" function calls (which takes an expression and a list of arguments, and the expression is always parenthised, and is never a macro).
It's probably best to discuss that in the https://github.com/fdedden/language-c99 package, since it's likely the one that will have to address it.
See:
https://github.com/Copilot-Language/copilot/blob/11ab89fbb839a99aa021f458567c7d4b7ad0c745/copilot-c99/src/Copilot/Compile/C99/Translate.hs#L56-L106
Code generation selects a function to call based only on the operation, without paying attention the type. This is almost certainly not what is desired. For example
abs
is a function onint
, but is applied also tofloat
anddouble
values. This causes the C compiler to (silently (!)) perform numerical coercions to/fromint
. Instead the generator should selectfabs
orfabsf
. Likewise choosing betweencopysign
orcopysignf
,cos
orcosf
, etc.The
abs
case for integer values is actually even tricker, as copilot uses the fixed-widthintX_t
types, but, as far as I know, the standard library does not provide operations for fixed-width types, but instead only providesabs
,labs
, andllabs
forint
,long
andlong long
respectively. TheSign
operation is also troublesome, as it applies also to integer operations on the copilot side, but has no corresponding C primitive that I know of. These may have to be provided as macros to get correct behavior.