Open khwilliamson opened 1 week ago
Could you add an #ifdef specific for EBCDIC builds between U8 and U16 math, or a "max_IV_fnd_in_array" in /mktables or /unicore or /regen.pl, whatever .pl generates array, EXTCONST U8 PL_strict_utf8_dfa_tab[] = {};
?
U8 math is often faster than U16 math on many CPUs archs, or all U16 math is silently upgraded to U32 math by CCs since U16 isn't a native type on many CPUs or most integer and bitwise ops are missing in U16 variants, and the CC always zero/sign extend to U32, then cast back down to U16 before storing to C stack/malloc.
If this (#ifdef U8 U16) is an easy fix do it, if it requires a page/screenful or pages/many screenfuls of code to implement, forget and stick to your current patch as written.
The goal is to have as few EBCDIC compilation dependencies as possible. Excluding machine-generated files, there are now fewer than 60 in the core, more than a third of them are in toke.c
. I continue to try to find ways to remove them when I encounter them. So I'm sceptical of adding new ones. But I would if I thought it would improve ASCII platform performance. But I don't believe it would do so here.
First, the declaration doesn't ask for a 16bit variable; it asks for a fast variable which is at least 16 bits wide. I looked at gotbot and I don't see extra instructions generated as a result of this not being 8bits. I tried this program
include
int_fast8_t a; unsigned char b[1000]; unsigned char c = b[a];
on several compilers. Here is what it generated with MSVC
signed char a DB 01H DUP (?) ; a ALIGN 4 unsigned char b DB 03e8H DUP (?) ; b unsigned char c DB 01H DUP (?) ; c void
dynamic initializer for 'c''(void) PROC ;
dynamic initializer for 'c'', COMDAT movsx rax, BYTE PTR signed char a ; a lea rcx, OFFSET FLAT:unsigned char b ; b movzx eax, BYTE PTR [rcx+rax] mov BYTE PTR unsigned char c, al ; c ret 0 void `dynamic initializer for 'c''(void) ENDP
Changing the declaration for a
to int_fast16_t
yields
int a DD 01H DUP (?) ; a unsigned char b DB 03e8H DUP (?) ; b unsigned char c DB 01H DUP (?) ; c void
dynamic initializer for 'c''(void) PROC ;
dynamic initializer for 'c'', COMDAT movsxd rax, DWORD PTR int a ; a lea rcx, OFFSET FLAT:unsigned char b ; b movzx eax, BYTE PTR [rcx+rax] mov BYTE PTR unsigned char c, al ; c ret 0 void `dynamic initializer for 'c''(void) ENDP
Ignore my 1st comment. "PL_strict_utf8_dfa_tab" is already decl as U16 in my perl, and contains numbers like 322 in that array. I was wrong.
edit: i was looking at wrong src code, its a U8 in my perl binary. Yeah MSVC will always upgrade U16 math to U32 while its in a register. Its only when its written back to memory, if ever, it has to do a explicity truncate of high 2 bytes.,
Okay, lets do this scientifically with god bolt since im typing too fast and not thinking enough.
test done with godbolt "x86 msvc v19.latest"
Char_t math is 0x19 bytes long. Short_t math is 0x1a bytes long. Char_t math wins. I16/U16 is half-native to x86/x64. I8/U8 and I32/U32 are native to x86/x64. I64/U64 are half-native to x64!!!!!!
All U16 math requires 0x66 operand override prefix to go back to 16-bit computing.
All U64 math on x64 AMD/Intel requires 0x44 or 0x48 operand prefix override. AMD made a stupid decision back in 1998-1999, that existing 32 bit .o files, will execute WITHOUT CHANGES in a 64 bit process. AMD only made the call/return operators, native accept I64/U64 operands. Everything else stayed binary compatible with old 32 bit x86 code. AMD never thought anyone would use 64 bit ints for counters, or indexes. AMD thought people only wanted 64 bit pointers, not 64 bit arithmitic. So basically U32 math is always better/smaller than U64 math on AMD x64.
USING SHORT MATH
/FAc /O2
#include <stdio.h>
void f1(char * pch) {
short s1 = *pch;
s1 = s1*2;
printf("%u", s1);
}
unsigned __int64 `__local_stdio_printf_options'::`2'::_OptionsStorage DQ 01H DUP (?) ; `__local_stdio_printf_options'::`2'::_OptionsStorage
`string' DB '%u', 00H ; `string'
_pch$ = 8 ; size = 4
void f1(char *) PROC ; f1, COMDAT
00000 8b 44 24 04 mov eax, DWORD PTR _pch$[esp-4]
00004 66 0f be 00 movsx ax, BYTE PTR [eax]
00008 66 03 c0 add ax, ax
0000b 98 cwde
0000c 50 push eax
0000d 68 00 00 00 00 push OFFSET `string'
00012 e8 00 00 00 00 call _printf
00017 83 c4 08 add esp, 8
0001a c3 ret 0
void f1(char *) ENDP ; f1
NOW WITH CHAR MATH
/FAc /O2
#include <stdio.h>
void f1(char * pch) {
char s1 = *pch;
s1 = s1*2;
printf("%u", s1);
}
unsigned __int64 `__local_stdio_printf_options'::`2'::_OptionsStorage DQ 01H DUP (?) ; `__local_stdio_printf_options'::`2'::_OptionsStorage
`string' DB '%u', 00H ; `string'
_pch$ = 8 ; size = 4
void f1(char *) PROC ; f1, COMDAT
00000 8b 44 24 04 mov eax, DWORD PTR _pch$[esp-4]
00004 8a 00 mov al, BYTE PTR [eax]
00006 02 c0 add al, al
00008 0f be c0 movsx eax, al
0000b 50 push eax
0000c 68 00 00 00 00 push OFFSET `string'
00011 e8 00 00 00 00 call _printf
00016 83 c4 08 add esp, 8
00019 c3 ret 0
void f1(char *) ENDP ; f1
We do not have enough people working on this project to enable us the luxury of sweating details.
I disagree that the savings of a single byte constitute a "win". It is instead a "wash".
And, again, 16 bit arithmetic is not relevant to this issue. The compiler has been directed to choose a suitably fast method that is larger than a character. Whether that is 32 or 64 (not likely to be anything else) is up to the compiler's discretion.
Testing on an EBCDIC box showed that this was too narrow.