Closed alexalkis closed 1 year ago
also updated binutils to fix link problems with the new debug sections
Works now. Thanks! Huh...gcc13.1 is 10% faster on -O3....and a bit slower on -Os
Are all your optimisations from 6 transferred to 13.1?
gcc13.1
vamos -v prime_sieve 1000000 || true
22:07:59.507 main: INFO: done. exit code=78498
22:07:59.507 main: INFO: total cycles: 33957608
22:07:59.507 main: INFO: vamos is exiting
# {1 <= primes <= 1000000} = 78498
allocated memory: 0.016 MB
vamos -v prime_sieve_Os 1000000 || true
22:07:59.651 main: INFO: done. exit code=78498
22:07:59.651 main: INFO: total cycles: 41154038
22:07:59.652 main: INFO: vamos is exiting
# {1 <= primes <= 1000000} = 78498
allocated memory: 0.016 MB
gcc6
vamos -v prime_sieve 1000000 || true
22:08:43.349 main: INFO: done. exit code=78498
22:08:43.349 main: INFO: total cycles: 37986524
22:08:43.350 main: INFO: vamos is exiting
# {1 <= primes <= 1000000} = 78498
allocated memory: 0.016 MB
vamos -v prime_sieve_Os 1000000 || true
22:08:43.505 main: INFO: done. exit code=78498
22:08:43.505 main: INFO: total cycles: 40401914
22:08:43.505 main: INFO: vamos is exiting
# {1 <= primes <= 1000000} = 78498
allocated memory: 0.016 MB
Hm, vamos cycles aren't correct: lsl.l #2,d0
yields 16 cycles but the spec says 8+2n which is 12.
There is also +4cycles for 'effective address' of immediate, isn't it? so 12+4 = 16
https://wiki.neogeodev.org/index.php?title=68k_instructions_timings
There is also +4cycles for 'effective address' of immediate, isn't it? so 12+4 = 16
https://wiki.neogeodev.org/index.php?title=68k_instructions_timings
There is no immediate, the offset is encoded into the instruction. Effective address is only relevant for memory accesses. I patched my vamos - still an advantage for 13.1. nice!
Either uae is also bugged or it is 16.
move.l #7090,d1
loop:
rept 1000
lsl.l #4,d0 ; this gives 17.04 (expected 12.something or 13.something)
;add.l d0,d0 ; this gives 8.58 (expected 8)
;add.w d0,d0 ; this gives 4.34 (expected 4)
endr
subq.l #1,d1
bne loop
rts
Assembled with:
vasmm68k_mot -Fhunkexe -nosym -kick1hunks -databss -o lsl lsl.asm
I have a loop going for 7090000 times (amiga 500 pal cpu frequency) and I time one command.
If the command is taking one cycle (hypothetical) it would need 1 sec, two cycles would need 2 secs etc.
This more or less gives 16 for lsl.l #2,dn
Timing is done from the shell (csh on amiga)
lsl.l #4,d0
needs 16 cycles.
The formula is 8 +2n, and with n=4 it's 16.
omg, I made a typo...I meant to type #2 .... ok, it goes to 12.80 with lsl.l #2,d0
So it is indeed 12.
My bad, sorry.
prime_sieve.zip