drcachesim.drstatecmp-fuzz timeout on a64

derekbruening commented 3 months ago

Hit a timeout in the drstatecmp-fuzz test on a64:

https://github.com/DynamoRIO/dynamorio/actions/runs/10586853907/job/29336444651?pr=6943

363/386 Test #317: code_api|tool.drcachesim.drstatecmp-fuzz .........................***Timeout 180.00 sec

ivankyluk commented 2 months ago

When I try to reproduce the failure on a local machine, I get SEGV instead of timeout.

Here are some examples of instruction which causes SEGV:

instr 1e020008: scvtf %w0 $0x40 -> %s8 instr 05a00c07: uzp2 %z0.q %z0.q -> %z7.q instr 1e43000e: ucvtf %w0 $0x40 -> %d14 instr 1e030008: ucvtf %w0 $0x40 -> %s8

generate_encoded_inst() in drstatecmp-fuzz-app.c randomizes the last 4 bits of the operand.

The issue is seen after the following commit:

commit 34b743517018d6e0ac46ce8e62c74d9789b87047 Author: Assad Hashmi assad.hashmi@arm.com Date: Wed Apr 3 15:09:23 2024 +0100

i#5365 AArch64 SVE core, part 2: add signals support (#6725)

I can't reproduce the issue at the previous commit:

commit 0838ea75983af914e803612f4bd01072c11c072a Author: Derek Bruening bruening@google.com Date: Tue Apr 2 17:52:37 2024 -0400

i#6712 record bounds: Add record filter sanity checks (#6749)

ivankyluk commented 2 months ago

@AssadHashmi I've tried and failed to find a function that correctly verify if an instruction with a randomized operand is valid or not.

Are there ways to filter invalid functions for the test, drstatecmp-fuzz-app.c, and in general?

The instructions above passed the tests below:

static int check_decoded_inst(instr_t *decoded_inst) { return instr_valid(decoded_inst) && instr_get_opcode(decoded_inst) != OP_xx && instr_raw_bits_valid(decoded_inst) && instr_operands_valid(decoded_inst); }

xdje42 commented 2 months ago

Data point: This test failed here: https://github.com/DynamoRIO/dynamorio/actions/runs/10912269525/job/30286593204#step:6:34690

AssadHashmi commented 2 months ago

Apologies for the late response @ivankyluk @derekbruening.

It looks like all the drcachesim.drstatecmp-fuzz failures referenced in this issue have been in aarch64-precommit and not in aarch64-sve-precommit-*. Looking at the example failures:

instr 1e020008: scvtf %w0 $0x40 -> %s8
instr 05a00c07: uzp2 %z0.q %z0.q -> %z7.q
instr 1e43000e: ucvtf %w0 $0x40 -> %d14
instr 1e030008: ucvtf %w0 $0x40 -> %s8

I notice there's an SVE instruction uzp2 %z0.q %z0.q -> %z7.q generated by the fuzzer. The aarch64-precommit tests are run on a Neoverse N1 machine which doesn't support SVE. The aarch64-sve-precommit-* are run on a Neoverse V1 machine which supports SVE and they don't fail drcachesim.drstatecmp-fuzz AFAICT.

It looks like commit https://github.com/DynamoRIO/dynamorio/commit/34b743517018d6e0ac46ce8e62c74d9789b87047 should have included changes to the fuzz tests detecting hardware features in order to only generate instructions available on the hardware. Can you confirm that you have not seen drcachesim.drstatecmp-fuzz failing in the aarch64-sve-precommit-*s?

derekbruening commented 2 months ago

Every failure should be linked to here so clicking on all the xrefs above should show which jobs+machines they ran on

AssadHashmi commented 2 months ago

Every failure should be linked to here so clicking on all the xrefs above should show which jobs+machines they ran on

I've checked all the xrefs linked here and drcachesim.drstatecmp-fuzz only fails on aarch64-precommit so I'm going to assume that the failures are to do with the test running SVE instructions on non-SVE hardware.

DynamoRIO / dynamorio

drcachesim.drstatecmp-fuzz timeout on a64 #6944