Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Implicit basic block labels are undocumented and confusing #16043

Open Quuxplusone opened 11 years ago

Quuxplusone commented 11 years ago
Bugzilla Link PR16043
Status NEW
Importance P normal
Reported by Paul Sokolovsky (pmiscml@gmail.com)
Reported on 2013-05-16 18:23:00 -0700
Last modified on 2013-05-27 15:12:00 -0700
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Running "clang -x c -emit-llvm -S -O2 test.c" on:

---------
int bar(int a);

int foo(int a, int b)
{
    if (bar(a))
        return a + b;
    else
        return a * b;
}
---------

Leads to following LLVM asm output:

---------
define i32 @foo(i32 %a, i32 %b) nounwind {
  %1 = tail call i32 @bar(i32 %a) nounwind
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %5, label %3

; <label>:3                                       ; preds = %0
  %4 = add nsw i32 %b, %a
  br label %7

; <label>:5                                       ; preds = %0
  %6 = mul nsw i32 %b, %a
  br label %7

; <label>:7                                       ; preds = %5, %3
  %.0 = phi i32 [ %4, %3 ], [ %6, %5 ]
  ret i32 %.0
}

declare i32 @bar(i32)
---------

Lack of explicit %3, %5, %7 labels looks rather confusing.
http://llvm.org/docs/LangRef.html gives very vague hint: "Each basic block may
optionally start with a label", it doesn't describe how handled a case when
label is not given. Googling with terms like "llvm implicit basic block labels"
and "llvm basic block without labels" didn't provide any useful hits.

After some consideration, and based on previous experience with LLVM internal
structures, I was abel to understand how implicit labels handled, the algorithm
for assigning names to unnamed entities within a function appears to be:

1. For each function, "unnamed entity" counter is initialized with 0.
2. Whenever unnamed tmp var is seen, it's assigned name as counter++
value.
3. Whenever unlabeled block is seen, it's assigned label as counter++
value.

However, all this matters got to be very confusing for novices, and thus reduce
adoption of LLVM technology.

Suggestions for resolution:

1. Definitely describe implicit label behavior in LangRef and probably, FAQ.
2. Consider always outputting explicit labels. Still support implicit labels
for parsing.
Quuxplusone commented 11 years ago

Originally posted at http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062136.html

Quuxplusone commented 11 years ago

These are basic blocks with no name, not basic blocks with a special (implicit) name. The situation is similar for instructions, which aren't required to have a name. When you see an instruction "%2 = XYZ" it doesn't mean that the instruction is called %2, in fact the instruction has no name.

So what is the %2 for? These numbers only exist in the human readable (textual) form of the IR. People regularly ask how they can get them via the API, but they can't because they don't exist. When writing out in textual form there needs to be a way to refer to nameless instructions, because otherwise you couldn't read the textual form back in and recreate the IR. The writer generates these numbers on the fly when writing the textual form out.

Quuxplusone commented 11 years ago
> These numbers only exist in the human readable (textual) form of the IR.

This is of course arguable, but "human readable (textual) form of the IR"
should be primary form of LLVM IR, in a sense that it should be clean, well-
documented and actually human-readable. Actually, I (and many others for sure)
take this for granted - all compilers have internal representations, Only LLVM
has "selling point" of IR being "first class" entity by having well-defined
external syntax, semantics, etc.

> These are basic blocks with no name, not basic blocks with a special
(implicit) name.

Let's simplify that by saying that those blocks *do* have labels, simply
because their labels are used by other instructions represented in the same
LLVM IR ASM function - see "br i1 %2, label %5, label %3" above for example.

> When writing out in textual form there needs to be a way to refer to nameless
instructions

Exactly, names/labels in textual form corresponds to object pointers in
internal in-memory structures. Those names/labels may be dummy (in a sense that
user can't rely on them having specific content, but still can rely on being
unique among themselves), but they do exist - they *are* in IR ASM. So the
point of this bug report is "Please make sure that all dummy names/labels are
rendered consistently, and document how dummy names are created"
Quuxplusone commented 11 years ago

Real basic block labels turn up in the final assembler while these do not. This is one reason why it is a mistake to refer to them as labels. Call them something like "a basic block reference" instead.

That said, I agree that this (and similarly for instructions) often confuses people, it is a FAQ and could be documented better.

Quuxplusone commented 11 years ago
> Real basic block labels turn up in the final assembler while these do not.

Final assembler as - LLVM asm or target asm? There's long way to target asm, so
I assume you mean LLVM asm, and them not turning up as labels is exactly the
(ultimate) topic of this bug. As I wrote in the mentioned mail, not rendering
those labels explicitly is as helpful as rendering instruction with unnamed
temporaries as:

 /* temporary 1 */ = tail call i32 @bar(i32 %a)

> Call them something like "a basic block reference" instead.

There's no "basic block reference" notion in http://llvm.org/docs/LangRef.html
, only labels and blockaddress() function (again, taking label as argument).
Why would LLVM team want to introduce notions without necessity, as "basic
block reference" hardly differs from notion of "label" (except that current
rendition code doesn't output "references" as "labels", which is the topic of
this bug)? Why not create easy to undertand IR language, which would truly
become lingua franca of compiler representation and wouldn't take a pundit to
interpret?
Quuxplusone commented 11 years ago

I meant target asm. As used in LLVM, "label" is a technical term that doesn't mean what you seem to think it means.

Since your comments look a lot like trolling, I am removing myself from this bug report.

Quuxplusone commented 11 years ago

Since your comments look a lot like trolling, I am removing myself from this bug report.

Sorry, didn't mean it to be like that, I'm just (pretty typically) a busy guy who hack on LLVM as a side project, attracted by promise of documented and defined IR. And there're still lots of surprises. Trying to comminicate them back, as I'm sure a lot of of people just gave up on LLVM instead (there're many interesting ports lying around, unfinished, apparently, due to still high comploxity of LLVM, part of which is arguable avoidable).

I'll shut up now, and come back when I have patches.

Quuxplusone commented 11 years ago

Patch for LangRef submitted: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174902.html

Quuxplusone commented 11 years ago

Patch for LangRef was committed in r182332.

I'd still appreciate if this bug was left open, I'd still like to proceed with FAQ update as suggested by Duncan, and then with actual code changes. It may take me some time though, and I'd like this bug to serve as a reference.