Open Quuxplusone opened 5 years ago
Attached dce-calcs-clang.c
(50187 bytes, text/x-csrc): preprocessed and partially reduced file
Attached dce-calcs-clang.i.gz
(260283 bytes, application/gzip): original source file, preprocessed and compressed
Attached bw-fixed.patch
(7621 bytes, text/plain): kernel patch to avoid passing structures by value
Got a better reduced test case, see https://godbolt.org/z/z5bVKS
struct bw_fixed { long long value; };
struct bw_fixed bw_min2(struct bw_fixed, struct bw_fixed);
struct bw_fixed bw_max2(struct bw_fixed, struct bw_fixed);
struct bw_fixed bw_mul(struct bw_fixed, struct bw_fixed);
static inline struct bw_fixed bw_max3(struct bw_fixed v1, struct bw_fixed v2,
struct bw_fixed v3) {
return bw_max2(bw_max2(v1, v2), v3);
}
struct bw_fixed bw_int_to_fixed(long long value);
int f(struct bw_fixed _a, struct bw_fixed _b, struct bw_fixed _c)
{
struct bw_fixed a=_a, b=_b, c=_c;
struct bw_fixed a1=_a, b1=_b, c1=_c;
struct bw_fixed a2=_a, b2=_b, c2=_c;
a1 = bw_mul(bw_int_to_fixed(a.value), bw_int_to_fixed(3));
b1 = bw_max3(a1, bw_mul(a2, a1), bw_mul(b2, c1));
c1 = bw_max3(a2, bw_int_to_fixed(3), bw_int_to_fixed(2));
return bw_max3(a1, b1, c1).value;
}
This just combines a couple of random operations from the original
kernel file. What can be seen here is that clang on ARM allocates
a stack slot for each temporary value resulting from calling
bw_int_to_fixed() or bw_max(), while it doesn't do that on x86-64,
and gcc never does it.
dce-calcs-clang.c
(50187 bytes, text/x-csrc)dce-calcs-clang.i.gz
(260283 bytes, application/gzip)bw-fixed.patch
(7621 bytes, text/plain)