IUCompilerCourse / python-student-support-code

Support for for students (Python)
MIT License
57 stars 38 forks source link

Murcake/x86 dataclasses #4

Closed temyurchenko closed 1 year ago

temyurchenko commented 2 years ago

I haven't turned some classes to dataclasses, mainly because I don't know what should be the types of their arguments. It would be very helpful if someone pointed them out.

The classes in question are: X86ProgramDefs, IndirectJump, TailJump, Global

Fyrbll commented 2 years ago

The argument to X86ProgramDefs should have type list[FunctionDef], where FunctionDef comes from Python's ast module.

The argument to IndirectJump should have type location (could be Variable or Reg)

The two arguments to TailJump should have types location and int respectively.

The argument to Global should be a str

temyurchenko commented 2 years ago

The argument to Global should be a str

If Global's argument is a string, why does it explicitly do str(self.name)?

Fyrbll commented 2 years ago

I don't know why Global's str method was written that way, but the constructor for Global is always passed a string. I'm also certain str('hello') is just 'hello'

temyurchenko commented 2 years ago

I don't know why Global's str method was written that way, but the constructor for Global is always passed a string. I'm also certain str('hello') is just 'hello'

True. Just was wondering if it was ever intended be used on more than just strings

temyurchenko commented 2 years ago

I don't know why Global's str method was written that way, but the constructor for Global is always passed a string. I'm also certain str('hello') is just 'hello'

Anyway, everything is ready then. I also have another PR that is fixing a bug a bit complicating the homework.

temyurchenko commented 1 year ago

Did you possibly get enough time, @Fyrbll?

Fyrbll commented 1 year ago

Thanks for the reminder. Revisiting this now.

Fyrbll commented 1 year ago
  1. The first argument to IndirectCallq is annotated with type str. Pycharm correctly suggested that it should be an arg. Essentially it's going to be a variable initially that's the target of an leaq instruction. This variable might get a register or a stack location as a home eventually. (In patch-instructions we will make sure the immediate target of this leaq is a register after which we will move the value to its true home).
  2. I was too hasty when I said earlier that

The two arguments to TailJump should have types location and int respectively.

Although it is true that in this course we always make the argument of TailJmp the register %rax, people might use a function with return type arg to process its argument. To keep things flexible I guess we should change the first argument's type to arg. Sorry about that.

EDIT. Formatting of quote.

temyurchenko commented 1 year ago
  1. The first argument to IndirectCallq is annotated with type str. Pycharm correctly suggested that it should be an arg. Essentially it's going to be a variable initially that's the target of an leaq instruction. This variable might get a register or a stack location as a home eventually. (In patch-instructions we will make sure the immediate target of this leaq is a register after which we will move the value to its true home).

    1. I was too hasty when I said earlier that

The two arguments to TailJump should have types location and int respectively.

Although it is true that in this course we always make the argument of TailJmp the register %rax, people might use a function with return type arg to process its argument. To keep things flexible I guess we should change the first argument's type to arg. Sorry about that.

EDIT. Formatting of quote.

Done

Fyrbll commented 1 year ago

Thanks for seeing those updates through.

I want to suggest these additional changes that ensure x86_ast.py passes all our test cases.

X86Program

Don't set frozen=True for X86Program. This is because we might want to "attach" extra information to a program between passes. Maybe we compute some property my_prop of a program p in some pass which will be used in a later pass, we'd like to write p.my_prop = ...

X86ProgramDefs

Don't set frozen=True for X86ProgramDefs for the same reason.

Instr

Change the definition of the class Instr so that Python doesn't complain that field args, which is a list, is unhashable.

@dataclass(frozen=True)
class Instr(instr):
    ...
    def __init__(self):
      return id(self)

Anything that is a Subclass of Instr

To ensure we don't break anything we want to somehow retain Instr's __hash__ unless otherwise specified in the class definition in the previous version of ast.py.

For example to pass the existing tests we can subsequently change Callq to call its parent's __hash__

@dataclass(frozen=True)
class Callq(instr):
    ...
    def __hash__(self):
        return super.__hash__(self)

I also mentioned "unless otherwise specified ... previous version of ast.py". One example of this is Variable

@dataclass(frozen=True)
class Variable(location):
    id: str

    def __str__(self):
        return self.id

    def __hash__(self):
        return hash(self.id)
Fyrbll commented 1 year ago

Just for reference here's a minimal sequence of changes that will allow the reference compiler to pass all tests.

-@dataclass(frozen=True)
+@dataclass
class X86Program:
    ...
@dataclass(frozen=True)
class Instr(instr):
    instr: str
    args: List[arg]

    def source(self):
        return self.args[0]

    def target(self):
        return self.args[-1]

    def __str__(self):
        return indent_stmt() + self.instr + ' ' + ', '.join(str(a) for a in self.args) + '\n'

+    def __hash__(self):
+        return id(self)
@dataclass(frozen=True)
class Callq(instr):
    func: str
    num_args: int

    def __str__(self):
        return indent_stmt() + 'callq' + ' ' + self.func + '\n'

+    def __hash__(self):
+        return super.__hash__(self)

To provide a better answer I will need to examine why the "default" dataclass-derived hash function for Callq causes wacky results.

temyurchenko commented 1 year ago

X86Program

Don't set frozen=True for X86Program. This is because we might want to "attach" extra information to a program between passes. Maybe we compute some property my_prop of a program p in some pass which will be used in a later pass, we'd like to write p.my_prop = ...

I'm going to do that, but just to see if feasible, it seems that the more common approach here is storing these additional properties in a hash-table, rather than monkey-patching? Would that work, or would it require more effort to further adapt?

Instr

Change the definition of the class Instr so that Python doesn't complain that field args, which is a list, is unhashable.

That is not the best definition of hash, because it conflicts with __eq__. We'll need to redefine __eq__ as well, but ideally, change args to be a tuple. I assume the latter approach would cause issues with the existing code?

For example to pass the existing tests we can subsequently change Callq to call its parent's __hash__

It's indeed interesting why it's required. Could it be because of the hash/__eq__ divergence describe above?

I also mentioned "unless otherwise specified ... previous version of ast.py". One example of this is Variable

I'm not sure I completely get it. Do tests rely on a particular implementation of hash, so new classes should all have the same __hash__ as the previous ones?

temyurchenko commented 1 year ago

@Fyrbll could you please check if the latest commit fixes the issues you mentioned?

temyurchenko commented 1 year ago

ping

Fyrbll commented 1 year ago

This commit fixes the issues I mentioned and passes all tests. It should be fine to merge.

  1. X86Program and X86ProgramDefs do not have frozen set to True anymore.

  2. Artem changed the constructor of Instr to accept any Iterable, such as a list. As mentioned above, Python's lists are not hashable. The iterable passed to the constructor is stored as a tuple. Tuples are hashable. This removes the need to make __hash__ the same as id for Instr. However just because Instr is hashable doesn't mean it's using a custom hash function. See the next point.

  3. In the latest commit, subclasses of instr (including Instr) have frozen set to True and eq set to False. Python documentation on dataclasses says

If eq and frozen are both true, by default dataclass() will generate a hash() method for you. If eq is true and frozen is false, hash() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, hash() will be left untouched meaning the hash() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

Since instr uses id-based hashing, all its subclasses should use id-based hashing as well. This should include Instr. In fact, changing the definition of Instr to the one in Artem's previous commit also passes all tests, so the whole tuple-as-argument situation isn't really necessary in my opinion, but it's also a neat trick so I don't want to get rid of the code.

@murcake do I have the right idea about the Instr class?

temyurchenko commented 1 year ago
  1. X86Program and X86ProgramDefs do not have frozen set to True anymore.

I don't frankly remember why it was done. My guess is that the classes uses mutable dicts and lists, so I thought that it was designed to be mutable, so it shouldn't be frozen. If you think this guess is valid, let's leave it that way, if not—I'll reintroduce frozenness.

changing the definition of Instr to the one in Artem's previous commit also passes all tests, so the whole tuple-as-argument situation isn't really necessary in my opinion

The main reasoning I believe, as above, was that Instr should be immutable. Thus, its args should be represented as a tuple. Thus, we need that weird bit in the constructor to accommodate list arguments as per book. What to actually do with it depends on your input on if my view on mutability is correct here.

@murcake do I have the right idea about the Instr class?

Yes!

Fyrbll commented 1 year ago

I don't frankly remember why it was done. My guess is that the classes uses mutable dicts and lists, so I thought that it was designed to be mutable, so it shouldn't be frozen. If you think this guess is valid, let's leave it that way, if not—I'll reintroduce frozenness.

Yes, I meant that they should not be frozen and I wanted to record that you made this change. Apologies for the confusion.

The main reasoning I believe, as above, was that Instr should be immutable. Thus, its args should be represented as a tuple. Thus, we need that weird bit in the constructor to accommodate list arguments as per book. What to actually do with it depends on your input on if my view on mutability is correct here.

I get it now. Thanks for the comment about immutability. I was only thinking in terms of __hash__