SingleStepTests / ProcessorTests

A language-agnostic JSON-encoded instruction-by-instruction test suite for the 8088, 68000, 65816, 65[c]02 and SPC700 that includes bus activity.
182 stars 13 forks source link

68000 test generation and random data #28

Open rasky opened 1 year ago

rasky commented 1 year ago

Looking at the 68000 tests, it seems that the initial state of the CPU for each test is seemingly random. I don't seem to find the generator (I guess it's not public), so I am not exactly sure "how much random" it is: whether all registers are random, or some are tweaked from an initial random state to reproduce specific behaviors to be tested.

I am trying to run the tests on embedded devices and this is proving a big complex because of the sheer size of the test vectors. I don't mind them being big as in "many tests" (could use even more!), but the problem is that the test data itself is very badly compressible because the initial/final state is really full of random numbers.

I was wondering if, in general, it would be possible to make public the PRNG algorithm used to generate the random initial state, and maybe put the seed for the PRNG in each test. If each test contained the seed used to generate its initial state (and the PRNG was documented), I could in theory regenerate the state from the seed only, without having to embed it altogether. If the initial state was then tweaked a bit after the PRNG pass, I could store just the differences, which would probably be much smaller.

Does this make any sense?

larsbrinkhoff commented 1 year ago

So I gather you are running the tests on real hardware? If so, do you have access to a genuine MC68000? I'm curious about some of the obscure corner cases, and if the test data really match hardware. In particular, what side effects are applied before an address error happens.

rasky commented 1 year ago

Actually, I’m not: I’m just running tests on an embedded device where I’m emulating a mc68000. I guess the issue I raise here might also eventually facilitate a hardware test though.

galibert commented 10 months ago

In particular, what side effects are applied before an address error happens.

All of them. The access happens with a0 dropped (technically, not connected), and the write part of the microcode instruction which waits for the end of the access executes fully. Then after that the exception is taken.

A subtlety though, that doesn't mean that for instance a tst.w (a1) to an odd address is going to change the flags. That's because the flag changing happens in the next microcode instruction (which does a and #ffff actually to set the flags) which is not reached. But on some other instructions it matters. It is identical for bus error btw.

dbalsom commented 2 months ago

I talked this over a bit with rasky on discord and I think the general idea is that the tests as they are waste a lot of space including random data that can be either generated or otherwise turned into a reference.

If we want to randomize our starting registers, it's not strictly necessary to list the starting register state at all. We can simply provide a seed value to a specified byte-producing PRNG and state that the registers are set, in some defined order of bytes, from the output of that PRNG.

An even simpler method for the user, perhaps, is to pre-generate a random pad file (ie, random.bin) and specify the offset into the file. Then we simply specify that the registers are written in a specific byte order from the 28 bytes (or whatever) within the random pad file at offset XXXX (wrapping); so we only need to store the offset. Registers in the initial state could be provided in addition to an offset if they are modified somehow, such as masking CX in x86 string operations so the tests don't run for one million cycles. The presence of a register in the initial state would override the value taken from the pad.

Correspondingly, we can modify the way the 'final' state is output to exclude register and memory states that have not changed from the 'initial' state.

The pad approach has an advantage vs seed I think in that it relieves the consumer from having to translate any PNRG into the language of their choice and removes any ambiguity over implementing it. Even parsing JSON can be a tall barrier depending on language, and I'd hate to add more.

We can even keep a mapping of pad offsets to memory data when, for example, doing string moves, and then in our memory states we have the possibility for some notation to reflect run-length-encodings (Write X bytes at offset Y in pad to memory location M)

I think this approach would significantly reduce the size of the test suites, at only a mild inconvenience for consumers of the test data. Even when space is not necessarily a concern, smaller test files are more quickly processed when executing a test suite.