Open RyanGlScott opened 1 year ago
I briefly looked at this, and there appear to be two separate (but related) issues involved:
The metadata references after each use of !dbg
are inlined. This happens because !dbg
locations are not handled when parsing a PartialMetadata
value, like most other forms of metadata. Instead, they are handled here, when parsing a FUNC_CODE_DEBUG_LOC
:
@llvm.dbg.declare
are inlined. Again, this happens because these arguments are handled in parseCallArgs
(when parsing FUNC_CODE_INST_CALL
), outside of the PartialMetadata
parsing loop.What both of these issues have in common is that both of these forms of metadata are parsed outside of Data.LLVM.BitCode.IR.Metadata
, where PartialMetadata
is handled. This is important because PartialMetadata
contains a MetadataTable
, and the MetadataTable
data type is largely responsible for the bookkeeping required to de-duplicate metadata references. Because the two forms of metadata above are parsed in a place without access to a MetadataTable
, they do not have a convenient way to determine if there is already a reference that points to them.
It's not yet clear to me what the right way to fix this is. I think we will likely need to move some of the MetadataTable
-related bookkeeping out of Data.LLVM.BitCode.IR.Metadata
and into another place so that they are accessible when parsing things in Data.LLVM.BitCode.IR.Function
. I tried looking at LLVM's source code for inspiration, but they appear to be using a rather different approach to metadata deduplication than what llvm-pretty-bc-parser
is using (see the getMetadataSlot()
function here).
@peterohanley found another occurrence of this issue in https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/265#discussion_r1443609413:
struct h {
bool a;
};
void g(struct h *s) {
}
void f() {
struct h s;
}
When compiling directly with clang++ -emit-llvm -g -S
, the .ll
output is:
After round-tripping through llvm-pretty-bc-parser
, the .ll
output is:
In the former .ll
file, we have:
define dso_local void @_Z1fv() #0 !dbg !22 {
%1 = alloca %struct.h, align 1
call void @llvm.dbg.declare(metadata %struct.h* %1, metadata !25, metadata !DIExpression()), !dbg !26
ret void, !dbg !27
}
!14 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "h", file: !1, line: 1, size: 8, flags: DIFlagTypePassByValue, elements: !15, identifier: "_ZTS1h")
!25 = !DILocalVariable(name: "s", scope: !22, file: !1, line: 9, type: !14)
Note how !25
, the metadata node for the local s
variable, is referenced from _Z1fv
's call to llvm.dbg.declare
. Moreover, !25
's type is !1
, a distinct
node for the struct h
type.
In the latter .ll
file, we instead have:
define default void @_Z1fv() !dbg !20 {
; <label>: 0
%1 = alloca %struct.h, align 1
call void @llvm.dbg.declare(metadata %struct.h* %1,
metadata !DILocalVariable(scope: !20, name: "s",
file: !DIFile(filename: "p2.cpp",
directory: "/home/ryanscott/Documents/Hacking/Haskell/llvm-pretty-bc-parser"),
line: 9,
type: !DICompositeType(tag: 19, name: "h",
file: !2, line: 1,
size: 8, align: 0,
offset: 0,
flags: 4194304,
elements: !13,
runtimeLang: 0,
identifier: "_ZTS1h"),
arg: 0, flags: 0, align: 0),
metadata !DIExpression()), !dbg !DILocation(line: 9, column: 12,
scope: !20)
ret void, !dbg !DILocation(line: 10, column: 1, scope: !20)
}
!1 =
distinct !DICompositeType(tag: 19, name: "h", file: !2, line: 1,
size: 8, align: 0, offset: 0, flags: 4194304, elements: !13,
runtimeLang: 0, identifier: "_ZTS1h")
!23 =
!DILocalVariable(scope: !20, name: "s", file: !2, line: 9,
type: !1, arg: 0, flags: 0, align: 0)
This time, !23
(the metadata node for the local s
variable) is not referenced from _Z1fv
's call to llvm.dbg.declare
. Instead, the contents of !23
and !1
(the metadata node for the struct h
type) are inlined into the llvm.dbg.declare
call, similar to the original example in this issue. This creates a problem because !1
is marked as distinct, but the inlined version of !1
in the llvm.dbg.declare
call is not distinct
. llvm-pretty-bc-parser
's test suite uses a diff that is sensitive to this difference, and as a result, the test suite fails on this example.
This p2.cpp
example is perhaps not as severe as the test case in #258, the latter of which llvm-as
outright rejects. llvm-as
will still accept the .ll
output of p2.cpp
even with all of the questionable inlining—it's just llvm-pretty-bc-parser
's test suite that is sensitive to the difference. Still, this suggests that we shouldn't be doing the questionable inlining in the first place, and this p2.cpp
failure is a symptom of that.
If you compile this program:
Like so:
Then this is what the resulting
test.ll
file will look like:So far, so good. Now let's see what happens when we round-trip this through
llvm-pretty-bc-parser
:The part that I want to draw attention to is the
call void @llvm.dbg.declare(...)
statement. In the originaltest.ll
file, we have this:But in the roundtripped code, we instead have this:
Note how instead of printing references to
!15
and!16
, the latter inlines the definitions of!15
and!16
entirely, resulting in much more verbose code.Although strange, this is not wrong in this example, since both versions of the program are equivalent. This is not always the case, however. If you repeat this experiment with the test case in #258 (using Clang 17), however, you will see this in the original
.ll
file:And this in the roundtripped version:
Note that we are now inlining the expression
metdata !DIAssignID
. This is invalid, as LLVM requires that allDIAssignID
nodes bedistinct
. This is the case in the original.ll
file, but the roundtripped version drops thedistinct
keyword after inlining. By that point, it is too late, as it is not possible to attach thedistinct
to inline metadata nodes—the only way to do so is by putting the node in the top-level list of metadata nodes (e.g.,!24
in the original.ll
file).In order to make the test case from #258 work, we will need to prevent
llvm-pretty-bc-parser
from performing this gratuitous inlining. This issue tracks that task.