Inline ASM IR Considered Harmful

Quuxplusone commented 8 years ago


Bugzilla Link	PR28970
Status	NEW
Importance	P normal
Reported by	Mehdi Amini (joker.eph@gmail.com)
Reported on	2016-08-13 13:42:27 -0700
Last modified on	2018-07-26 06:19:27 -0700
Version	unspecified
Hardware	PC All
CC	ahatanak@gmail.com, chandlerc@gmail.com, compnerd@compnerd.org, dexonsmith@apple.com, froydnj@gmail.com, geek4civic@gmail.com, hfinkel@anl.gov, james@jamesmolloy.co.uk, llvm-bugs@lists.llvm.org, llvm-dev@ndave.org, marina.yatsina@intel.com, matze@braunis.de, peter@pcc.me.uk, rjmccall@apple.com, rnk@google.com, srhines@google.com, tejohnson@google.com, willdtz@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR9295

TL/DR: I propose to change our inline ASM representations (both call to inline asm and module level inline asm) to enrich it with the list of symbol defined and referenced.

Our internal representation for inline ASM is an opaque blob till we reach the MC layer. This is harmful for any use of LLVM IR and Bitcode other than a straight compiler pipeline.

For example the following does not compile with LTO on Darwin at this time:

$ cat inline_asm.c void foo() {} int main() { asm("call _foo" : : ); } $ clang inline_asm.c $ clang inline_asm.c -flto ld: reference to symbol (which has not been assigned an address) _foo in '_main' from /var/folders/4z/k9mg8rls7wsfm6t6hbm0_bxw0000gn/T/cc-032652.o for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation)

The following does not compile (even without LTO) without forcing an attribute "used" on foo():

static void foo() {} int main() { asm("call _foo" : : ); }

The following defines a function bar():

asm(".text\n" ".globl _bar\n" "_bar:\n" "\tret\n" );

And will generate this IR:

module asm ".text" module asm ".globl _bar" module asm "_bar:" module asm "\09ret"

All tools that handles bitcode files where we usually have objects need to workaround this. Instead of hacking on these opaque string, we should embed in the IR and in the Bitcode the informations that are needed to handle these files on par with regular object files. The fact that tools like nm and ar have to link all the backends for the only purpose of supporting inline ASM properly seems like a good indication that we're missing a layer.

I think enriching Inline ASM IR construct with the list of symbols defined and referenced would solve this issue. The frontend would be responsible to have a target with ASM parser available to build this list from opaque blob.

However, this would likely be an issue for targets for which we don't have an integrated assembler. I don't have a good solution for this at that time.

Quuxplusone commented 8 years ago

To add to this: a problem we currently have is estimating the cost of inlining an inline asm call. We currently assume it costs the same as one IR instruction, but each can contain unlimited individual instructions. We end up inlining some pretty hefty semaphore code all over the place.

We'll be pushing target hooks for this that allow a target to estimate the number of instructions in an asm block, but it'll be stupid - counting number of newlines or something. It'd be better to have a more structured solution.

Quuxplusone commented 8 years ago

(In reply to comment #1)
> To add to this: a problem we currently have is estimating the cost of
> inlining an inline asm call. We currently assume it costs the same as one IR
> instruction, but each can contain unlimited individual instructions. We end
> up inlining some pretty hefty semaphore code all over the place.
>
> We'll be pushing target hooks for this that allow a target to estimate the
> number of instructions in an asm block, but it'll be stupid - counting
> number of newlines or something. It'd be better to have a more structured
> solution.

Given that our targets have assembly parsers, we ought to be able to do this in
a semi-intelligent way.

Quuxplusone commented 8 years ago

As prior art, I think the gold plugin already solves this problem by parsing the inline asm to find these symbol defs and uses.

It sounds like you're suggesting that we move this parsing and analysis responsibility up to the frontend, so that the IR just "knows" these things, rather than having to re-derive them. I think that's reasonable, clang already calls LLVM MC for parsing MS inline asm anyway. Clang supports being built without any targets and still generating IR, but in that case, maybe you just don't get this fancy symbol analysis.

If we change the inline asm IR, is it worth handling asm goto? (http://llvm.org/pr9295)

Quuxplusone commented 8 years ago

There is parsing of "module-level inline assembly" of this form:
http://llvm.org/docs/LangRef.html#module-level-inline-assembly.

But AFAIK no parsing of inline assembly calls, which rely on any used values
being added to the llvm.used global. This causes some correctness issues in LTO
mode. See the discussion here:
  http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
which I am working around in ThinLTO mode via r266877.

Quuxplusone commented 8 years ago

(In reply to comment #3)
> As prior art, I think the gold plugin already solves this problem by parsing
> the inline asm to find these symbol defs and uses.

libLTO does the same, but as Teresa mentioned, it is only for module level
assembly.
It is not that much annoying for libLTO to have to invoke the target, since it
is already linked in.
It is more visible when llvm-nm or llvm-ranlib embed all the backends.

> If we change the inline asm IR, is it worth handling asm goto?
> (http://llvm.org/pr9295)

Yeah I think it is worth looking at at the same time. Thanks for the pointer!

Quuxplusone commented 8 years ago

I personally like the notion of inline assembly being a block of text that is
just passed along to the assembler without being interpreted by the compiler
(Though I can certainly understand the opposing view). As that avoids
complexity and gives you a tool at hand to experiment with new
extensions/instructions that are only implemented in the assembler but no the
compiler yet.

As for the example, the gcc docu says: The gcc docu says: "Accessing data from
C programs without using input/output operands (such as by using global symbols
directly from the assembler template) may not work as expected." So the correct
way would be to pass the "foo" symbol in via a parameter with an immediate
constraint.
I've also seen other projects working around similar problems by adding
__attribute__((used)) to the called function.

Quuxplusone commented 8 years ago

But even if you pass "foo" through the constraint, how do you express this in
the IR?
Also how do you express the fact that some module level inline asm *defines* a
symbol?

Quuxplusone commented 8 years ago

Well in gcc you can do this:

void foo();

void bar() {
    __asm__("callq %P0" : : "i"(&foo));
}

clang fails on this example, saying "invalid operand for inline asm constraint
'i'" though IMO that is a clang bug. In the IR I would expect an asm block with
a Global used as parameter to it, no reason that shouldn't work.

As for defining new symbols in module level inline asm I am not sure. Is that a
thing that happen?

Quuxplusone commented 8 years ago

(In reply to comment #8)
> Well in gcc you can do this:
>
> void foo();
> void bar() {
>     __asm__("callq %P0" : : "i"(&foo));
> }
>
> clang fails on this example, saying "invalid operand for inline asm
> constraint 'i'" though IMO that is a clang bug.

Works for me, and here as well: https://godbolt.org/g/rhtk8I

Here is the IR I get:

>In the IR I would expect an
> asm block with a Global used as parameter to it, no reason that shouldn't
> work.

define i32 @main() #0 {
  call void asm sideeffect "callq ${0:P}", "i,~{dirflag},~{fpsr},~{flags}"(void (...)* bitcast (void ()* @foo to void (...)*)) #1, !srcloc !2
  ret i32 0
}

That's already working fine actually :)

> As for defining new symbols in module level inline asm I am not sure. Is
> that a thing that happen?

At least, Clang generates itself module level assembly to add
definition/reference to implement the fragile ObjC ABI.
Also Gold implements as well the parsing of module level ASM, so I guess it
happens enough in the wild?

Quuxplusone commented 8 years ago

(In reply to comment #6)
> I personally like the notion of inline assembly being a block of text that
> is just passed along to the assembler without being interpreted by the
> compiler (Though I can certainly understand the opposing view). ...

I'd like to agree, but I'm not sure that I do. The issue that James brought up
about the inlining costs is pernicious, and we should have something better
than our current approach. I figure that, since we need to parse the inline
assembly text at some point anyway, we might as well do it sooner and use the
information that can provide.

Quuxplusone commented 8 years ago

(In reply to comment #10)
> (In reply to comment #6)
> > I personally like the notion of inline assembly being a block of text that
> > is just passed along to the assembler without being interpreted by the
> > compiler (Though I can certainly understand the opposing view). ...
>
> I'd like to agree, but I'm not sure that I do. The issue that James brought
> up about the inlining costs is pernicious, and we should have something
> better than our current approach. I figure that, since we need to parse the
> inline assembly text at some point anyway, we might as well do it sooner and
> use the information that can provide.

I think it's fair to parse the assembly for an inline cost heuristic.

We should still debate whether the system *requires* us to parse assembly in
order to compile sourcecode or whether that is just an optional step necessary
for some optimizations. I think the question is whether it should be possible
to plug in a custom assembler with -fno-builtin-assembler and use extensions
that the clang builtin assembler does not understand (the inline cost heuristic
could simply warn and fail in case of unknown syntax).
From a conceptual view we are importing the whole asm language into the C/C++
language, which feels like bad design/layering...

Quuxplusone commented 8 years ago

What about making -fno-builtin-assembler incompatible with inline ASM?

Quuxplusone commented 8 years ago

(In reply to comment #12)
> What about making -fno-builtin-assembler incompatible with inline ASM?

That would be throwing the baby out with the bath water. What about documenting
and fixing some heuristics, like if you find "NAME:" and ".globa?l NAME" in
file scope asm consider it a function definition (i.e. only introduce a minimal
asm subset so we can implement an really simple independent parser for catching
that one case we miss today)...

Quuxplusone commented 8 years ago

(In reply to comment #13)
> (In reply to comment #12)
> > What about making -fno-builtin-assembler incompatible with inline ASM?
>
> That would be throwing the baby out with the bath water. What about
> documenting and fixing some heuristics, like if you find "NAME:" and
> ".globa?l NAME" in file scope asm consider it a function definition (i.e.
> only introduce a minimal asm subset so we can implement an really simple
> independent parser for catching that one case we miss today)...

Just for the record: In an ideal world I would force people to specify the list
of global exported symbols defined in file scope assembler in their source
language (force them to specify that list after the asm string). Though I can
imagine that pushing a C extension for this can be tricky hence I propose that
naive heuristic instead.

Quuxplusone commented 8 years ago

(In reply to comment #13)
> (In reply to comment #12)
> > What about making -fno-builtin-assembler incompatible with inline ASM?
>
> That would be throwing the baby out with the bath water. What about
> documenting and fixing some heuristics, like if you find "NAME:" and
> ".globa?l NAME" in file scope asm consider it a function definition (i.e.
> only introduce a minimal asm subset so we can implement an really simple
> independent parser for catching that one case we miss today)...

Assuming that'd be 100% reliable for symbols *defined* (being wrong here can
change the semantic of the program), can you achieve the same thing reliably
for symbol *referenced*?

Quuxplusone commented 8 years ago

(In reply to comment #15)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > What about making -fno-builtin-assembler incompatible with inline ASM?
> >
> > That would be throwing the baby out with the bath water. What about
> > documenting and fixing some heuristics, like if you find "NAME:" and
> > ".globa?l NAME" in file scope asm consider it a function definition (i.e.
> > only introduce a minimal asm subset so we can implement an really simple
> > independent parser for catching that one case we miss today)...
>
> Assuming that'd be 100% reliable for symbols *defined* (being wrong here can
> change the semantic of the program), can you achieve the same thing reliably
> for symbol *referenced*?

Yeah that is tricky, somehow file scope assembly feels misdesigned here and I
wonder why we never had such problems before... Anyway I think we agree that
adding lists of defined/referenced global objects to our .ll representation is
the right thing to do.

The question of of whether the frontend needs to parse the assembly to get
those lists or whether we should rather have a C extension is a discussion for
another day, where we should probably be pragmatic and use the target parser
even if it makes me a little sad...

Quuxplusone commented 8 years ago

(In reply to comment #16)
> Yeah that is tricky, somehow file scope assembly feels misdesigned here and
> I wonder why we never had such problems before...

Because we're actually parsing it whenever it matters (llvm-nm, llvm-ranlib,
libLTO, gold-plugin, ...)? What else were you thinking about?

Quuxplusone commented 8 years ago

FWIW, I actually strongly disagree with your position Matthias...

I think that inline assembly is a language-level construct and it is reasonable
for the language to demand rigorous semantics for it down to and including that
symbol references are declared. As one of *many* examples, we register allocate
around the clobbers listed, we schedule around the memory clobbers, etc.

I think extending this to symbols makes *total* sense.

I think that if users want to directly work with an assembler, they should
literally run an assembler over a .s file.

However, I also think it is reasonable for the C/C++ frontend to provide as a
convenience to its users support for parsing the inline asm up-front and
automatically populating these when possible. But in some cases it won't be,
and I think the burden should in that case fall upon the user. That could
happen when the target doesn't have an integrated assembler the frontend can
leverage for this, much as the MS-style inline assembly also won't work in that
case (by my vague understanding of how that works).

And finally, I do not believe we have any obligation to support inline assembly
inside of IR for targets without integrated assemblers. If there are ways we
*can* support it, great, but if we have to break auto-upgrade for those targets
I don't think that should hold up progress here.

Quuxplusone commented 8 years ago

For the record: I disagree with the notion that the compiler has to actually parse the inline assembly text to understand it. Making things explicit instead by providing information about the assembly in the form of clobbers, register constraints and extending this to symbols defined/used is the way to go!

I mainly complained because the discussion shifted in the direction of "let's just parse the inline assembly" instead of considering it an opaque blob. I realized during the discussion that this is more about doing the parsing as a stopgap measure in the frontend because we lack syntax to express these defined/used symbol lists for file-scope assembly blocks (which is pragmatically fine) rather than having a requirement that llvm must be able to parse every inline assembly blob (which is not fine IMO).

Quuxplusone commented 8 years ago

(In reply to comment #19)
> For the record: I disagree with the notion that the compiler has to actually
> parse the inline assembly text to understand it. Making things explicit
> instead by providing information about the assembly in the form of clobbers,
> register constraints and extending this to symbols defined/used is the way
> to go!
>
> I mainly complained because the discussion shifted in the direction of
> "let's just parse the inline assembly" instead of considering it an opaque
> blob. I realized during the discussion that this is more about doing the
> parsing as a stopgap measure in the frontend because we lack syntax to
> express these defined/used symbol lists for file-scope assembly blocks
> (which is pragmatically fine) rather than having a requirement that llvm
> must be able to parse every inline assembly blob (which is not fine IMO).

Ah, I see.

I think we're talking about it as *slightly* more than a stopgap at least at
the frontend level. I think we're talking about *allowing* a frontend to parse
the text and automatically provide this kind of information but not *requiring*
the frontend always be able to do so. However, if it cannot for whatever
reason, then it and LLVM get to assume that everything it could ever want to
know about the blob of text is specified in the constraints.

This sounds fairly close to what you're saying, but not precisely the same so I
wanted to check....

Quuxplusone commented 8 years ago

In my mind, it's clearly okay for the compiler to parse assembly to produce better code, provided it honors the stated constraints regardless of what it sees. (For example, if the asm states that it clobbers memory, we'd better treat it as clobbering memory, even if it doesn't contain any store instructions.)

GCC's specification for inline / file-level assembly requires the user to take responsibility for things like symbol references by using attribute((used)). Nobody here seems to be claiming that clang should or even could promise that that won't be necessary. So our basic semantic model is still the same as GCC's: the code isn't really correct if it isn't using attribute((used)). At best you're promising that that's not necessary for specific targets.

Having a list of globals used on an assembly constant seems appealing, but it's not clear to me that those uses can support replaceAllUsesWith unless you're going to rewrite the assembly, which sounds like a lot of compiler heroism.

Quuxplusone commented 8 years ago

> GCC's specification for inline / file-level assembly requires the user to
> take responsibility for things like symbol references by using
> __attribute__((used)).  Nobody here seems to be claiming that clang should
> or even could promise that that won't be necessary.  So our basic semantic
> model is still the same as GCC's: the code isn't really correct if it isn't
> using __attribute__((used)).  At best you're promising that that's not
> necessary for specific targets.

I don't think there is a syntax to expose the symbols *defined* by module-level
assembly though.

> Having a list of globals used on an assembly constant seems appealing, but
> it's not clear to me that those uses can support replaceAllUsesWith unless
> you're going to rewrite the assembly, which sounds like a lot of compiler
> heroism.

How do we handle llvm.used? I guess we just forbid RAUW on it?

Quuxplusone commented 8 years ago

(In reply to comment #22)
> > GCC's specification for inline / file-level assembly requires the user to
> > take responsibility for things like symbol references by using
> > __attribute__((used)).  Nobody here seems to be claiming that clang should
> > or even could promise that that won't be necessary.  So our basic semantic
> > model is still the same as GCC's: the code isn't really correct if it isn't
> > using __attribute__((used)).  At best you're promising that that's not
> > necessary for specific targets.
>
> I don't think there is a syntax to expose the symbols *defined* by
> module-level assembly though.

Just an ordinary extern declaration would work.  I'm not sure why it matters
where it's defined.  But if you want to take advantage of it being defined
within the linkage unit to e.g. use a more optimal access pattern, you're going
to have to understand object-file linkage semantics, not just know whether the
symbol is defined there or not.

> > Having a list of globals used on an assembly constant seems appealing, but
> > it's not clear to me that those uses can support replaceAllUsesWith unless
> > you're going to rewrite the assembly, which sounds like a lot of compiler
> > heroism.
>
> How do we handle llvm.used? I guess we just forbid RAUW on it?

It's interpreted as meaning that you have to provide a symbol with the right
visibility (internal / whatever); and since you can't analyze all the uses,
you're pretty limited about what sort of manipulations you can do.  A
conservative transform wouldn't be able to analyze a use by an assembly
constant, either, but it also wouldn't necessarily realize that it still needs
to define that symbol.

Quuxplusone commented 8 years ago

(In reply to comment #23)
> (In reply to comment #22)
> > > GCC's specification for inline / file-level assembly requires the user to
> > > take responsibility for things like symbol references by using
> > > __attribute__((used)).  Nobody here seems to be claiming that clang should
> > > or even could promise that that won't be necessary.  So our basic semantic
> > > model is still the same as GCC's: the code isn't really correct if it
isn't
> > > using __attribute__((used)).  At best you're promising that that's not
> > > necessary for specific targets.
> >
> > I don't think there is a syntax to expose the symbols *defined* by
> > module-level assembly though.
>
> Just an ordinary extern declaration would work.  I'm not sure why it matters
> where it's defined.  But if you want to take advantage of it being defined
> within the linkage unit to e.g. use a more optimal access pattern, you're
> going to have to understand object-file linkage semantics, not just know
> whether the symbol is defined there or not.

My issue is more practical: if symbol foo is defined in module level inline ASM
in file foo.c, and you compile foo.c with LTO and shove the bitcode file into a
static archive, you need to actually expose the definition for foo in the
symbol table for the archive, otherwise the linker won't load foo.o from the
archive on reference to the symbol foo.

Quuxplusone commented 8 years ago

Ah, yes, that does seem like something where you do just need to understand what symbols are defined. Hazards of not really being the canonical object-file format, I guess.

It does not make any sense for the frontend to be making this decision independently. I'd be happy to query some backend API, but that's obviously not going to work reliably unless every target promises to have an integrated assembler. It's not abstractly unreasonable for the compiler to only promise to support certain things on certain platforms; I think that's the best you're going to get.

Quuxplusone / LLVMBugzillaTest

Inline ASM IR Considered Harmful #28965