AntonLydike / riscemu

RISC-V emulator in python
MIT License
48 stars 13 forks source link

Allow addressing Registers by their number (e.g. x0) #51

Closed superenginegit closed 9 months ago

superenginegit commented 10 months ago

Currently it is only possible to use a register by it's ABI-name. I think it would be nice, to be able to also use a register by it's number instead of it's name.

This is something I can implement and create a PR for it. But I wanted to ask if this is something you would be interested in having in this project before putting work into it.

superlopuh commented 10 months ago

There are a few things that make it a little trickier than it might seem, as there are some mechanisms, like register allocation, that can only reason about the ABI names. It would probably be a good idea to do change that logic to reason on x0 etc as that would help us avoid the confusion wrt fp and s0 both being the same as x8. I'd be happy to move to xk being the source of truth regarding identity, as long as we would still be able to pretty-print the ABI name in the IR, as that helps us reason about the logic. We would probably want to let the user specify what register should be printed in the assembly in case of ambiguity.

superlopuh commented 10 months ago

Thinking about it a bit more, I don't know of a good way to get roundtrippung to work wrt printing and parsing. We don't want to print the indices in the IR, as that would add a lot of noise, but without that I'm not sure how to recover the ABI name. What would you like this for? Is it for inspecting the IR when printed, or for reasoning about iR? Because there might be different solutions depending on the use-case you have in mind.

superenginegit commented 10 months ago

The main use-case I had in mind was actually being able to have for example x0 instead of zero in my assembly code, as I sometimes write x0 etc. and when I go to run the code and it doesn't work it takes some time to change it.

I definitely think that changing the logic to work with the register number would be the way to go. For printing I actually prefer the ABI names almost always, maybe a command-line flag could be useful to select whether to print the name or the number, or maybe, as an optional parameter for reg.dump and mem.dump to select what format to use.

AntonLydike commented 10 months ago

Thanks for showing interest and offering to contribute @superenginegit!

I'm not 100% sure on what's the best way quite yet, but maybe a preprocessor pass during parsing to canonicalize register names could be an option?

superenginegit commented 10 months ago

I would say that a preprocessor pass would be a good way to go about this. I can't really think of another way that would work as well.

superlopuh commented 10 months ago

I think it would be really interesting to see what happens when the register is represented with a more complex structure that more accurately identifies it than just the string that we use currently. There are a few restrictions:

  1. it has to be immutable
  2. all register representations with the same bits in the binary encoding should be equal and have the same hash
  3. it has to support registers outside of the current binary encodings (we use j0, j1, ... for extra registers)

Maybe something like this could work?

@dataclass(frozen=True)
class RISCVRegister:
  index: int
  name_hint: str = field(default="", hash=False)

class RISCVRegisterType(Data[RISCVRegister], TypeAttribute, ABC):
  ...

@irdl_attr_definition
class IntRegisterType(RISCVRegisterType):
    name = "riscv.reg"

x0 = IntRegisterType(0)
zero = IntRegisterType(0, "zero")
assert x0 == zero
fp = IntRegisterType(8, "fp")
s0 = IntRegisterType(8, "s0")
j0 = IntRegisterType(-1, "j0")
f0 = FloatRegisterType(0)
ft0 = FloatRegisterType(0, "ft0")

If you have the time, could you try this out, and see what happens? Hopefully it would still preserve most of our logic for canonicalization and register allocation, while allowing for writing "x0" when that's more convenient.

superenginegit commented 10 months ago

Sure, that looks great, I'll have time to test it out in the next few days.

AntonLydike commented 10 months ago

@superenginegit, excuse my colleague, he thought this was an issue on another project of ours. I guess a good place to put the canonicalization would be around line 30 here: https://github.com/AntonLydike/riscemu/blob/f9e92d626e2f1ea8f3ef6deb49a95c31765984eb/riscemu/parser.py#L23-L33

And I probably wouldn't bother with creating a custom class for representing register names.

superenginegit commented 10 months ago

Thanks, I'll get started then.