Closed ITotalJustice closed 2 years ago
[ARM] 2553
data_processing: 895
// multiply: 4
multiply_long: 8
// single_data_swap: 2
// branch_and_exchange: 1
halfword_data_transfer_register_offset: 55
halfword_data_transfer_immediate_offset: 59
single_data_transfer: 1024
// undefined: 768
block_data_transfer: 512
// branch: 512
// software_interrupt: 256
[THUMB] 784
move_shifted_register: 96
add_subtract: 32
move_compare_add_subtract_immediate: 128
alu_operations: 16
hi_register_operations: 16
pc_relative_load: 32
load_store_with_register_offset: 32
load_store_sign_extended_byte_halfword: 32
load_store_with_immediate_offset: 128
load_store_halfword: 64
sp_relative_load_store: 64
load_address: 64
// add_offset_to_stack_pointer: 4
push_pop_registers: 16
multiple_load_store: 64
// conditional_branch: 60
// software_interrupt: 4
// unconditional_branch: 32
// long_branch_with_link: 64
// undefined: 76
TOTAL: 3337
this the total number of functions generated for each function.
the commented out functions are those that are not templated, so they're not counted in the total.
the most generated function by far is single data transfer
at 1024
. however, only 6bits are needed to decode everything in the instruction. so 6*6=36. 1024 down to just 36 instructions...
some notes regarding min size without templating reg/imm in data_proc and single_data.
data processing:
// 245.3 KiB (251,168)
// 372.3 KiB (381,192) max
// 312.6 KiB (320,064) new
single data transfer:
// 242.4 KiB (248,256) without
// 269.4 KiB (275,832) 00
// 431.6 KiB (442,000) max
// 317.5 KiB (325,104) new
with the above two commits, as of commit https://github.com/ITotalJustice/notorious_beeg/commit/ee92bc5202bc4b46b1001a8ee16914142babae40, the final size is:
366.1 KiB (374,864)
that's a reduction of 703.1 KiB (720064)
from my readme:
my cpu has 256KiB icache, which my final binary far exceeds
(1.0 MiB (1,094,928)
), this is with full optimisations and lto. without lto, it's much bigger stil...really there's not much code to the emulator, so i really think i can at the very least get it to ~512KiB, likely a LOT smaller.
without tables generated (-O3 -lto) and built as a single file (all hot functions inlined) the final binary is
123.6 KiB (126,520)
.summary: (all -O3 -lto, single file (force inlined r/w funcs)
123.6 KiB (126,520)
303.5 KiB (310,800)
+179.9 KiB
893.2 KiB (914,616)
+769.6 KiB
1.0 MiB (1,094,928)
+945.7 KiB