emustudio / edigen

Emulator Disassembler Generator
GNU General Public License v3.0
4 stars 0 forks source link

Apply several optimizations in decoder #49

Closed vbmacher closed 3 years ago

vbmacher commented 3 years ago

Unnecessary read of unit

In GenerateMethodsVisitor, when the unit was read in a method when processing a Mask, with given start and length, it is not necessary to read it again in nested switch.

unit = read(start + 0, 5);

switch (unit & 0x1f) {
        case 0x10:
            ...

        default:
            unit = read(start + 0, 5);  // <-- unnecessary
            switch (unit & 0x1c) {
                case 0x08:
               ...
...        

Unnecessary read of unit no.2

In GenerateMethodsVisitor, when the mask is zero, there won't be any pattern as the child of the mask (in any depth). If there was, current code will be broken anyway, because Pattern will generate case and default switch branches, while the switch itself is defined in the parent mask only if the mask is not zero. Therefore it is not necessary to read unit. Variant returning subrules will use readBytes anyway.

    private void line(int start) throws InvalidInstructionException {
        unit = read(start + 0, 5);  // <-- unnecessary
        instruction.add(LINE, readBytes(start + 0, 5));
    }

Do not use break under default branches

It is unnecessary to call break; in the default: branch, since it's always the last branch.

                                    }
                                    break;
                                }
                                break;
                            }
                            break;
                        }
                        break;
                    }
                    break;
                }
                break;
            }
            break;

I have doubts about readBytes

As I traverse the code - from decoder to disassembler, I'm starting to think that byte[] array - of instruction "bits" is not necessary. I'm talking about:

public class DecodedInstruction {
    public void add(int key, byte[] bits) {   // <-- this one
...

With this it is connected method readBytes in Decoder, which is very complicated and hard to understand. If you imagine this method is often called only for few bits, it can be pretty expensive call.

Anyway, in disassembler the "bits" are somehow presented - either as integers, floats, doubles, or strings. If we have bits which have more than 4 bytes, we cannot present it as integer (but we allow it). For doubles I agree 8 bytes should be supported, if we want to emulate some modern CPUs. But actually I doubt it. I think emuStudio will always be emulating older machines or some esoteric machines. Therefore I suggest to stick with 4 bytes as maximum (which will be also hard limit for instruction size, because root rule mask has usually size of the whole instruction). Also, unit size is 4 bytes.

Maximum instruction size on modern processors is only few bytes

It is unnecessary to store 1024 bytes per instruction (field instructionBytes). Even modern CPUs on x86 have instruction size of maximum 120 bits which is 15 bytes. Therefore it will be better if we read instructionBytes ahead. For preserving "generality" we should first find out instruction maximum size by some new visitor, which will store it in a constant. Then, using MemoryContext.read(memoryPosition, count) we can try to read max bytes. If there is less bytes available, the method will return only those bytes which are available and will not throw.

Remove "+ 0"

It doesn't look good when the code has + 0 - we know that it's a zero so we don't need to add it. E.g.:

private void instruction(int start) throws InvalidInstructionException {
        unit = readBits(start + 0, 32); // <-- here

        switch (unit & 0x00070000) {
        case 0x00000000:
            ...
            line(start + 0);  // <-- here
            ...