8-byte long long in ACK C for i386, m68020

kernigh commented 4 years ago

This branch adds the type long long to ACK C, which is an 8-byte integer on i386 and m68020, and causes an error on other machines. A new test set in tests/plat/long-long operates on 8-byte integers on i386 and m68020. Beware that not everything works:

I don't provide conversions between long long and floating-point types. Such a conversion might cause an error from ncg, or it might emit code that corrupts the stack.
446020022096LL works, but 446020022096 gets cut to an unsigned long. ACK C still follows C89, not C99, for literals without the LL suffix. C89 had no long long. C99 would use long long if the literal can't fit in long.
libc has almost no support for 8-byte integers. For example, printf("%lld", x) doesn't work. I provide int64_t and uint64_t in , but almost nothing else.

ACK languages had no 64-bit integers until now. My mail to tack-devel gave a few reasons to want 8-byte integers. Parts of the ACK assume that integers are never wider than 4 bytes, so I work around this assumption.

Commit 054b9c8 adds the pseudo .data8 to the assembler. .data8 only takes a literal integer, not an expression, because expressions still use a machine-dependent integer type that might have only 4 bytes.
Commit 1faff41 modifies ncg for i386, i80, i86, m68020, powerpc, vc4, to use .data8 when they encounter an 8-byte constant.
Commits 007a63d and 15950f9 add long long and LL literals to the C compiler. By default, long long has size -1 and causes the error, "no long long for this machine". If a platform sets long long to size 8, then the C compiler emits EM code like adi 8. The C compiler uses a new type writh (wide arithmetic) for constant operations, to avoid changing the old type arith.
Most of the commits add rules like adi 8 to i386 ncg, or tests in C; but rol 8 and ror 8 have tests in EM. Before these changes in 2019, the last change to mach/i386/ncg/table was in 1995.
Commits fd27acb, e867861, 0b0c3d5 fix the m68020 assembler and add rules like adi 8 to m68020 ncg.

EM compact assembly can't encode an 8-byte constant for ldc (because the implementation of sp_cst8 is missing), so I modified the C compiler to avoid ldc with constants wider than 4 bytes. This diverts 8-byte constants to rom (where EM encodes them as strings) and causes slower code. For example, the C code

long long x;
long long f(void) { return x | 0x100LL; }

becomes the i386 code

I_1:
.data8  256
...
mov edx,(_x)
mov ecx,(_x+4)
or edx,(I_1)
or ecx,(I_1+4)

instead of the simpler

mov edx,(_x)
mov ecx,(_x+4)
or edx,256 ! may become orb dh,1

To implement sp_cst8 and enable ldc, one would need to widen the type arith from long to int64_t. This is difficult, because parts of the ACK assume that arith is always long. This branch keeps arith as long.

To convert between long long and floating-point types, one would need to change the interface to our mach/proto/fp software floating-point, which now assumes that integers have at most 4 bytes. This also affects i386, because its 8087 library has the same interface. For m68020, I use the ack's emulator, which is missing floating-point (but a newer version of musahi might have floating-point).

davidgiven commented 4 years ago

Thank you very much for this! I left a couple of small comments but I didn't spot any obvious issues.

My only concern is that there's a lot of scope for subtle, hard-to-find breakage in cemcom.ansi due to e.g. erroneously casting a writh to an arith and losing the top 32 bits, but this kind of problem isn't new and we should go ahead anyway. I've filed #209 because we really, really should have a proper test suite for this.

kernigh commented 4 years ago

After reading your comments, I added 2 more commits to change assembler's valu_t to int64_t and add back _EM_LSIZE == 8 to .

In my commit message for a434749, I mentioned startrek_c.linuxppc by mistake. It should be startrek_c.linuxmips. I'm too lazy to edit the message.

davidgiven commented 4 years ago

LGTM. Thank you very much!

davidgiven / ack

8-byte long long in ACK C for i386, m68020 #208