Fix issue #124: do not emit DIDerivedType dwarfAddressSpace if 0

kquick commented 1 year ago

The llvm project does not print this field if the value is zero, causing a difference in the output between this library and llvm, which causes the llvm-pretty-bc-parser round-trip testing to fail. This makes the pretty-printing in this library match the llvm behavior.

RyanGlScott commented 1 year ago

I'm a bit surprised by this, since my reading of the LLVM .ll pretty-printing code here is that if a DIDerivedType defines a dwarfAddressSpace, then it should always be printed. (LLVM's printInt function normally does skip printing 0 values, but they explicitly pass false here to override this behavior.)

In fact, I think the real bug lies in the way that we parse DIDerivedTypes in llvm-pretty-bc-parser. My understanding of the LLVM bitcode parsing code here is that the bitcode doesn't directly store the DWARF address space value, but rather the value plus one. That is, the bitcode would store 1 for address space 0, 2 for address space 1, and so on. Moreover, the bitcode stores a 0 when there is no address space at all, which we represent as Nothing on the llvm-pretty side.

Looking at the code llvm-pretty-bc-parser uses to parse DIDerivedTypes here, none of this is being accounted for. That feels like actual cause of the issue here, and if that were fixed, I suspect that the existing code in llvm-pretty would be perfectly suited to pretty-print DIDerivedTypes.

kquick commented 1 year ago

Here's a separate reference from the llvm-project: https://github.com/llvm/llvm-project/blob/bbe8cd13335300958b04db5318c31ff52714f96f/llvm/lib/Bitcode/Reader/MetadataLoader.cpp#L1544-L1548

I guess the question is: how do we want to represent this at the AST level. There are really three states involved here: not specified, explicitly no Dwarf Address Space, and a Dwarf Address Space with a value (I don't think we should overload the first two via Nothing). If we properly want to encompass all of that, we probably need to define a corresponding type with three constructors to place into the AST, and then handle that on both the printing and the parsing side. Currently, we represent that by allowing 0 to have special meaning, and expecting the user of the AST to know that and to know that an actual dwarf address space is (- 1). This PR fixes the pretty-printing for the latter case; I can add the triple-constructor datatype instead if you agree that is the right way to handle this, or do you think we should be overloading Nothing?

RyanGlScott commented 1 year ago

There are really three states involved here: not specified, explicitly no Dwarf Address Space, and a Dwarf Address Space with a value

I interpreted the comment that you linked to differently. By my reading, there are only two possible states: either a DIDerivedType has no DWARF address space, or it does. That is, 0 is a perfectly valid DWARF address space. The confusing part is how they're encoded in bitcode:

No DWARF address space (in source LLVM) -> 0 (in bitcode)
DWARF address space 0 (in source LLVM) -> 1 (in bitcode)
DWARF address space 1 (in source LLVM) -> 2 (in bitcode
...
DWARF address space N (in source LLVM) -> N+1 (in bitcode)

I suppose you could argue that prior to LLVM 5 (the version in which dwarfAddressSpace was introduced), the bitcode doesn't say one way or the other about whether a DIDerivedType has a DWARF address space. But I'd argue that it's fine to lump that in with the "no DWARF address space" case. That's because (1) the two cases would be handled identically in practically every conceivable way (including pretty-printing), and (2) I don't see any benefit in separating this out into a third case.

kquick commented 1 year ago

OK, canceling as superceded by https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/253

GaloisInc / llvm-pretty

Fix issue #124: do not emit DIDerivedType dwarfAddressSpace if 0 #125