f0rki / mapping-high-level-constructs-to-llvm-ir

A guide that explains how high level programming language constructs are mapped to the LLVM intermediate language.
https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io
Other
599 stars 62 forks source link

Error in (non-virtual) C++ multiple inheritance example (?) #39

Open porcelijn opened 3 years ago

porcelijn commented 3 years ago

Hi Michael,

Thanks for the nice book, I enjoyed reading!

I think the %Derived = type { ...} example in multiple interitance misrepresents the C++ example it tries to model. The C++ code:

class Derived:
    public BaseA,
    public BaseB
{
public:
    void SetC(int value)
    {
        SetB(value);
        _c = value;
    }

private:
    int _c;
};

essentially creates two instances of field _a. This quirky multiple inheritance interpretation in C++ renders _a in Derived ambiguous, and requires disambiguation like BaseA::_a or BaseB::_a in Derived. The example llvm code, however,merges both instances of _a:

%Derived = type {
    i32,        ; '_a' from BaseA
    i32,        ; '_b' from BaseB
    i32         ; '_c' from Derived
}

While this might be what is generally desired, it does not reflect the example C++ semantics. I think it should look something like:

%Derived = type {
    i32,        ; '_a' from BaseA
    i32,        ; '_a' from BaseB
    i32,        ; '_b' from BaseB
    i32         ; '_c' from Derived
}

(and this fix would obviously breaks a bunch of the setters that should be adjusted accordingly) Btw. another possible improvement would be to use different types for _a, _b and _c in the example (u32, float, i32) as a way to emphasize implicitly which field we're setting. Hope this helps, t

archfrog commented 3 years ago

My fault :-) I am, once again, astonished by some of the bugs I made back when I wrote the guide... You are absolutely right and there's really no good excuse for the code above :-)

I suspect I meant to make a slightly more advanced example of single inheritance, not an example of multiple inheritance, so that BaseB should not inherit from BaseA, and somehow got everything mixed up. I have programmed a lot in C++, but I always strived to avoid duplicate members because of the subtle bugs that are likely to occur when you are not careful (as I wasn't when I wrote the above).

I think the chapter should be renamed to "Multiple Times Single Inheritance" (or something like that) because I'm pretty sure that's what I tried to do and BaseB be changed to not inherit from BaseA.

Doing C++ multiple inheritance examples in LLVM is likely extremely tedious and error-prone, so I don't think I'd like to take that task on my shoulders.

archfrog commented 3 years ago

Nah, just checked the guide. It very clearly differentiates between "multiple inheritance" and "virtual inheritance": I simply blew it once again ;-) I'll see if I can fix it pretty soon. Thanks a lot for the bug report!

archfrog commented 3 years ago

Uh, I'm looking at it right now. I must admit that my C++ is very weak nowadays (I code mostly in PHP, Python, and C# these days)... I don't even recall which variant of _a that takes precedence; I'd guess that the last definition is the active definition but this makes everything go amok: There need to be some sort of offset table that is used to compute which data member, in Derived, that needs to be updated when.

My immediate reaction to this is that the non-virtual variant of multiple inheritance should be removed. Personally, I see the handling of non-virtual overloading of data members as both archaic and arcane: Not something that's really used a lot anymore. I think it is outside the scope of this document as it requires a lot of bookkeeping when generating LLVM IR.

What do you guys say? Are any of you willing to make the chapter work or should we simply delete it?

f0rki commented 3 years ago

I suggest merging the virtual and multiple inheritance chapters and discuss the various ways you can do it there. Also discuss the difference between the regular and virtual inheritance. I'd also add what clang is currently doing:

%class.Derived = type { %class.BaseA, %class.BaseB, i32 }
%class.BaseA = type { i32 }
%class.BaseB = type { %class.BaseA, i32 }
f0rki commented 3 years ago

Personally, I see the handling of non-virtual overloading of data members as both archaic and arcane

Agree. Doesn't seem so useful. But I think it might be good to discuss it and mention the caveats of doing this.

porcelijn commented 3 years ago

:+1: on archaic and arcane. That is: diamond shaped non-virtual inheritance (as in streams IO) may be well defined, but has long ago ceased to be an uncontroversial unique selling point of C++. Where paradigmatic inheritance was early on conceived as a long hierarchy of implementations, the contemporary paradigm is a shallow and wide hierarchy. Think Java interfaces, Rust traits and C++ abstract base classes. I think the chapter contents is still valuable and accurate enough to keep and improve, but maybe the current stuff should be pushed toward the back in favour of a more elaborate section on (multiple) interfaces?

porcelijn commented 3 years ago

https://github.com/f0rki/mapping-high-level-constructs-to-llvm-ir/issues/39#issuecomment-953700339

Uh, I'm looking at it right now. I must admit that my C++ is very weak nowadays (I code mostly in PHP, Python, and C# these days)... I don't even recall which variant of _a that takes precedence; [...]

A: fwiw - there is no precedence, the C++ compiler simply throws an error unless you specify which _a you mean: g++ 9.3 says:

warning: direct base ‘BaseA’ inaccessible in ‘Derived’ due to ambiguity
error: request for member ‘_a’ is ambiguous
note: candidates are: ‘int BaseA::_a’

clang++ 10:

warning: direct base 'BaseA' is inaccessible due to ambiguity:
error: non-static member '_a' found in multiple base-class subobjects of type 'BaseA':
note: member found by ambiguous name lookup
porcelijn commented 3 years ago

In think the interface chapter could be elaborated a bit on. The gotchas with multiple (possibly) overlapping base classes are probably in getting the vtable layout correct and making sure that narrowing casts as well as dynamic downcasts use the right offsets. (To be honest, I always get lost in the weeds in this part of compiler design.)

Here's a (not so) small example of multiple inheritance of abstract base classes (ie. Java implements interface).

#include <iostream>

struct Sender {
  virtual ~Sender() = default;
  virtual int send() = 0;
};

struct Receiver  {
  virtual ~Receiver() = default;
  virtual void receive(int) = 0;
};

struct Filter : public Sender, public Receiver {
  virtual ~Filter() = default;
  virtual bool is_empty() const = 0;
};

inline bool is_odd(int value)
{
  return (value % 2) != 0;
}

struct OddFilter : public Filter {
  OddFilter() : empty_(true) {}
  virtual ~OddFilter() = default;

  bool is_empty() const override {
    return empty_;
  }

  int send() override {
    // assert(!empty_);
    int result = data_;
    empty_ = true;
    return result;
  }

  void receive(int value) override {
    // assert(empty)
    if(is_odd(value))
    {
      data_ = value;
      empty_ = false;
    }
  }

private:
  bool empty_;
  int data_;
};

int main()
{
  Filter* f = new OddFilter();
  Sender* s = f;   // narrowing dynamic cast Filter* to Sender*
  Receiver* r = f; // narrowing dynamic cast Filter* to Receiver*

  for(int i = 0; i < 10; ++i)
  {
    // assert(f.is_empty());
    r->receive(i);
    if(!f->is_empty())
    {
      std::cout << "Received odd: " << s->send() << std::endl;
    }
  }

  delete f;
}

I admit that it's a contrived example, but maybe it can help as a basis? I like that the book so far uses very short to the point examples so please feel free to alter/skin/ignore this darling :flushed: