Closed AWSjswinney closed 4 years ago
Has this implementation been tested with go-fuzz?
I tested with go-fuzz and found an issue, which I have now fixed. Let me know if you want the commits squashed or left separate.
Thanks for the tip on go-fuzz!
I have run -func Fuzz
for about 140 million execs and -func FuzzFraming
for about 25 million execs so far.
Squashed commit please.
arm64 requires 8 additional bytes of stack
I'm sure you're right, but out of curiosity, can you remind me why the ARM64 calling convention requires an extra 8 bytes compared to AMD64?
arm64 requires 8 additional bytes of stack
I'm sure you're right, but out of curiosity, can you remind me why the ARM64 calling convention requires an extra 8 bytes compared to AMD64?
Thanks for asking. I did some research to confirm that assertion:
According to the Procedure Call Standard for the Arm, the stack pointer must be quad-word aligned, i.e. sp % 16 == 0
.
Additionally, from the next section on the frame pointer:
The lowest addressed double-word shall point to the previous frame record
That means that sp+0
will contain the previous frame record pointer and callee args will begin at sp+8
.
Thanks!
If you're curious and have more spare time than I do... one low-priority thing I was meaning to experiment with at some point was replacing the runtime·memmove
call (and the state spillage) with a REP MOVSB
(on x86_64
, obviously) or an arm64
equivalent. The actual copy might not be as fast but you'd avoid the function call inside the loop. It may or may not be a net win. Feel free to play with that idea.
This change was produced by taking the amd64 assembly and reproducing it as closely as possible for the arm64 arch.
The main differences:
Tested on an AWS m6g.large (ARMv8.2):