Closed samitolvanen closed 4 years ago
cc @jcai19
Edit: See below. This produces the same error, but for a different reason.
Stand-alone reproducer:
$ cat test.c
void a(void) {
asm(
".subsection 1 \n"
"1: \n"
"nop \n"
"2: \n"
".previous \n"
".org . + 2b-1b \n"
);
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test.c
test.c:8:3: error: expected assembly-time absolute expression
".org . + 2b-1b \n"
^
<inline asm>:6:6: note: instantiated into assembly here
.org . + 2b-1b
^
test.c:8:3: error: expected assembly-time absolute expression
".org . + 2b-1b \n"
^
<inline asm>:6:6: note: instantiated into assembly here
.org . + 2b-1b
^
2 errors generated.
Not entirely sure why it prints out seemingly the same error twice though.
Thanks for the reproducer! A slightly simplified example:
$ cat foo.s 1: 2: .org . + 2b-1b
$ llvm-mc -filetype=obj -triple=arm64 foo.s -o foo.o foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^
if I put parentheses around 2b and 1b respectively as follows, then it assembled. $ cat foo.s 1: 2: .org . + (2b-1b)
Seems like IAS evaluates the operands in order and therefore fails to evaluate the first two operands. Will see if I can come up with a quick fix.
Adding parenthesis fixes the reproducer, but the kernel has them already, so I wonder if creduce found a different problem...:
#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled) \
".if "__stringify(cfg_enabled)" == 1\n" \
"661:\n\t" \
oldinstr "\n" \
"662:\n" \
".pushsection .altinstructions,\"a\"\n" \
ALTINSTR_ENTRY(feature) \
".popsection\n" \
".subsection 1\n" \
"663:\n\t" \
newinstr "\n" \
"664:\n\t" \
".previous\n\t" \
".org . - (664b-663b) + (662b-661b)\n\t" \
".org . - (662b-661b) + (664b-663b)\n" \
".endif\n"
I think kernel has parentheses around each operand in this case, i.e. .org - (Expr1) + (Expr2). I haven't verified but adding one more pair of parentheses should fix the problem, .org - ((Expr1) + (Expr2)).
I tested this change:
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..94c859e95f01 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -78,8 +78,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
newinstr "\n" \
"664:\n\t" \
".previous\n\t" \
- ".org . - (664b-663b) + (662b-661b)\n\t" \
- ".org . - (662b-661b) + (664b-663b)\n" \
+ ".org . - ((664b-663b) + (662b-661b))\n\t" \
+ ".org . - ((662b-661b) + (664b-663b))\n" \
".endif\n"
#define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb) \
And now I'm getting different errors:
In file included from init/main.c:17:
In file included from ./include/linux/module.h:13:
In file included from ./include/linux/stat.h:6:
In file included from ./arch/arm64/include/asm/stat.h:12:
In file included from ./include/linux/time.h:6:
In file included from ./include/linux/seqlock.h:36:
In file included from ./include/linux/spinlock.h:54:
In file included from ./include/linux/irqflags.h:16:
./arch/arm64/include/asm/irqflags.h:56:15: error: invalid .org offset '3608' (at offset '3616')
asm volatile(ALTERNATIVE(
^
./arch/arm64/include/asm/alternative.h:296:2: note: expanded from macro 'ALTERNATIVE'
_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
^
./arch/arm64/include/asm/alternative.h:98:2: note: expanded from macro '_ALTERNATIVE_CFG'
__ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
^
./arch/arm64/include/asm/alternative.h:80:14: note: expanded from macro '__ALTERNATIVE_CFG'
".previous\n\t" \
^
<inline asm>:26:7: note: instantiated into assembly here
.org . - ((664b-663b) + (662b-661b))
^
Thanks for the verification. Based on IAS's code the extra parentheses did work and made the assembling proceeded a little bit further. Will take a look at the new error.
Note that the commit that tripped this bug didn't actually touch the .org
statements, it just changed .pushsection
/ .popsection
to .subsection
/ .previous
.
Right, so is .subsection
handled differently than .pushsection
in LLVM IAS? What's the difference between those directives?
So if I replace .subsection and .previous with .pushsection and .popsection respectively in the reproducer, the issue still happens. I guess what I do not understand is how come the same issue was not exposed before the change? Did the change alter program semantics?
$ cat foo.s
.pushsection .altinstr_replacement, "a" 1: nop 2:
.popsection .org . + 2b-1b
$ llvm-mc -filetype=obj -triple=arm64 foo.s -o clang.o foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b
So if understand correctly, the original code placed the code within .pushsection/.popsection pair into a new section (.altinstr_replacement), while the newer version puts the same code into a new subsection of the current section. So it appears that somewhere IAS failed to update the offset within the section and caused the assertion failure. It however still does not explain why the old code worked. Anyway, will keep investigating.
My initial interestingness test for creduce was too simple. The reproducer fails with the same error, but for a different reason. Here's the full preprocessed arch/arm64/kernel/debug-monitors.c, which is one of the files that fails to build.
Thanks for the full reproducer. I just noticed the change should be
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..94c859e95f01 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -78,8 +78,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
newinstr "\n" \
"664:\n\t" \
".previous\n\t" \
- ".org . - (664b-663b) + (662b-661b)\n\t" \
- ".org . - (662b-661b) + (664b-663b)\n" \
+ ".org . - ((664b-663b) - (662b-661b))\n\t" \
+ ".org . - ((662b-661b) - (664b-663b))\n" \
".endif\n"
#define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb) \
This probably explains the new error you saw, although it still failed and went back to the reported error. Still investigating.
Ah, good catch!
I think this might be related to the .arch_extension lse
directive. Here's another attempt at a reproducer:
$ cat test.c
void a(void) {
asm(
".subsection 1 \n"
"1: \n"
"nop \n"
"2: \n"
".previous \n"
"3: \n"
"nop \n"
"4: \n"
".subsection 1 \n"
"5: \n"
".arch_extension lse \n"
"nop \n"
"6: \n"
".previous \n"
".org . - (6b-5b) + (4b-3b) \n"
);
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test.c
test.c:17:10: error: expected assembly-time absolute expression
".org . - (6b-5b) + (4b-3b) \n"
^
<inline asm>:15:6: note: instantiated into assembly here
.org . - (6b-5b) + (4b-3b)
^
1 error generated.
This builds if I remove .arch_extension lse
, or replace .subsection/previous
with .pushsection/popsection
:
$ cat test1.c
void a(void) {
asm(
".pushsection .mysection,\"a\" \n"
"1: \n"
"nop \n"
"2: \n"
".popsection \n"
"3: \n"
"nop \n"
"4: \n"
".pushsection .mysection,\"a\" \n"
"5: \n"
".arch_extension lse \n"
"nop \n"
"6: \n"
".popsection \n"
".org . - (6b-5b) + (4b-3b) \n"
);
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test1.c
$ llvm-objdump -D test1.o
...
Disassembly of section .text:
0000000000000000 <a>:
0: 1f 20 03 d5 nop
4: c0 03 5f d6 ret
Disassembly of section .mysection:
0000000000000000 <$x.0>:
0: 1f 20 03 d5 nop
0000000000000004 <$x.1>:
4: 1f 20 03 d5 nop
...
$ cat test2.c
void a(void) {
asm(
".subsection 1 \n"
"1: \n"
"nop \n"
"2: \n"
".previous \n"
"3: \n"
"nop \n"
"4: \n"
".subsection 1 \n"
"5: \n"
"nop \n"
"6: \n"
".previous \n"
".org . - (6b-5b) + (4b-3b) \n"
);
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test2.c
$ llvm-objdump -D test2.o
...
Disassembly of section .text:
0000000000000000 <a>:
0: 1f 20 03 d5 nop
4: c0 03 5f d6 ret
0000000000000008 <$x.0>:
8: 1f 20 03 d5 nop
c: 1f 20 03 d5 nop
...
Awesome. Thanks for reducing the reproducer. Just want to mention here for future references that I played with IAS a bit more and realized it had nothing to do with parentheses. The existing ones should be enough.
This actually reminders me of https://github.com/ClangBuiltLinux/linux/issues/742. IAS implements each section as a list of data blocks called fragments. Currently it can only resolve the differences of labels in the same fragment. .arch_extension creates a new fragment, and therefore causing IAS to fail. I have a LLVM patch ( https://reviews.llvm.org/D69411) from a while ago to add support to resolve the difference of labels if the fragments they belong to are separated by fix-sized fragments only (the size of some fragments may be changed, such as those including jump instructions). There were concerns that the list of fragments may be changed after the differences are resolved, so the patch was never submitted. I guess I will have to revisit the patch and see if there is anything we can do. Meanwhile, a possible workaround could be to move .arch_extension to the beginning of the code, if that's possible.
Meanwhile, a possible workaround could be to move .arch_extension to the beginning of the code, if that's possible.
Moving .arch_extension lse
to the beginning of the assembly block fixes the reproducer, but the kernel still fails to build. It looks like something is broken when we have multiple functions with the same assembly construct:
$ cat test3.c
void a(void)
{
asm(
".arch_extension lse \n"
"1: \n"
"nop \n"
"2: \n"
".subsection 1 \n"
"3: \n"
"nop \n"
"4: \n"
".previous \n"
".org . - (4b-3b) + (2b-1b) \n"
);
}
/* identical to a() */
void b(void)
{
asm(
".arch_extension lse \n"
"1: \n"
"nop \n"
"2: \n"
".subsection 1 \n"
"3: \n"
"nop \n"
"4: \n"
".previous \n"
".org . - (4b-3b) + (2b-1b) \n"
);
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test3.c
test3.c:29:3: error: expected assembly-time absolute expression
".org . - (4b-3b) + (2b-1b) \n"
^
<inline asm>:10:6: note: instantiated into assembly here
.org . - (4b-3b) + (2b-1b)
^
1 error generated.
Removing one of the functions fixes the issue, and so does removing the .arch_extension
directives. I assume this is the same bug, but I'm not sure how we can work around this in the kernel.
Thanks. The given example assembles if I first compile it into an assembly file and then use llvm-mc to do the assembling. Don't know if the assembly was generated correctly, but if the answer is yes, then it seems the handling of inline assembly is problematic in this case.
$ clang --target=aarch64-linux-gnu -c test3.c -o test3.o
test3.c:29:10: error: expected assembly-time absolute expression
".org . - (14b-13b) + (12b-11b) \n"
^
<inline asm>:10:6: note: instantiated into assembly here
.org . - (14b-13b) + (12b-11b)
^
$ clang --target=aarch64-linux-gnu -S test3.c -o test3.s
$ cat test3.s
.text
.file "test3.c"
.globl a // -- Begin function a
.p2align 2
.type a,@function
a: // @a
// %bb.0: // %entry
//APP
.Ltmp0:
nop
.Ltmp1:
.text 1
.Ltmp2:
nop
.Ltmp3:
.text
.Ltmp4:
.org (.Ltmp4-(.Ltmp3-.Ltmp2))+(.Ltmp1-.Ltmp0), 0
//NO_APP
ret
.Lfunc_end0:
.size a, .Lfunc_end0-a
// -- End function
.globl b // -- Begin function b
.p2align 2
.type b,@function
b: // @b
// %bb.0: // %entry
//APP
.Ltmp5:
nop
.Ltmp6:
.text 1
.Ltmp7:
nop
.Ltmp8:
.text
.Ltmp9:
.org (.Ltmp9-(.Ltmp8-.Ltmp7))+(.Ltmp6-.Ltmp5), 0
//NO_APP
ret
.Lfunc_end1:
.size b, .Lfunc_end1-b
// -- End function
.ident "clang version 11.0.0 (https://github.com/llvm/llvm-project.git b925ca37a8f28851e110b8f0cd6e7f9a36a15d65)"
.section ".note.GNU-stack","",@progbits
.addrsig
$ llvm-mc -filetype=obj -triple=arm64 test3.s -o test3.o
$ llvm-objdump -d test3.o
test3.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <a>:
0: 1f 20 03 d5 nop
4: c0 03 5f d6 ret
0000000000000008 <b>:
8: 1f 20 03 d5 nop
c: c0 03 5f d6 ret
10: 1f 20 03 d5 nop
14: 1f 20 03 d5 nop
There are no .arch_extension lse
directives in the generated assembly code. Perhaps because there are actually no LSE instructions in the example?
I'd guess so. The generated assembly looks reasonable to me.
Removing either of the two directives the given example did not solve the problem. So I dug a little bit more and found that when IAS parsed .arch_extension, it created a new subtarget info object. However, when creating the fragment for the nop instruction between label 3 and 4 in function b, the old subtarget info object was used, and on detecting a different subtarget info created by .arch_extension, it created a new fragment, causing label 3 and 4 to be placed into two separate fragments. And because the subtarget info was associated with the assembly parser, this issue would happen with either .arch_extension. Still trying to locate the place that firstly passed the old subtarget info object.
cc @smithp35
I have identified the cause, and put my analysis in the upstream bug I reported. Not sure if there is an easy fix on LLVM though.
This is where I look at a combination of .arch multiple functions, subsections and org and start to cry.
To some extent this looks like a combination of things. The first is that in an assembler file .arch applies to all fragments created from that point on in the file. The compiler generates a subtarget per function and is unaware of the .arch in the inline assembler. I don't think the behaviour of .arch when used in inline assembly, when there are multiple functions in the same section is well specified, if specified at all.
One interesting experiment is that we can get this to assemble with -ffunction-sections although the meaning of subsection changes here as subsection 1 is different in both cases. It may be possible (I don't know the original context) to rewrite using attribute section. I'd have thought that code that assembles to semantically different output when -ffunction-sections is used is probably best avoided, but again this may be required for the original code to work.
I think that your original patch https://reviews.llvm.org/D69411 would have been able to evaluate the expression even with different subtargets. It may be worth restarting it. I seem to remember James Knight wanting to do some refactoring to make sure that it worked with subsections.
Apologies I haven't got a lot of spare time to look myself, only weekends and evenings. Will do my best if I can spot anything obvious.
One interesting experiment is that we can get this to assemble with -ffunction-sections although the meaning of subsection changes here as subsection 1 is different in both cases
Looks like -ffunction-sections
doesn't fix the kernel build though (using debug-monitors.i from an earlier comment):
$ clang --target=aarch64-linux-gnu -O2 -ffunction-sections -c debug-monitors.i
...
<inline asm>:27:7: note: instantiated into assembly here
.org . - (662b-661b) + (664b-663b)
^
Switching from subsections to actual sections is the only workaround I've found.
Is there a way to pass arguments to the integrated assembler? It should be fine to enable LSE for all inline assembly instead of sprinkling .arch_extension lse
directives in the code. However, -Wa,-march=...
etc. seem to be ignored.
As I understand it the integrated assembler uses the -march from the clang invocation, no need to use -Wa. An alternative would be to add attribute((target("lse"))) to every function in the file. This is notionally the same as .arch_extension lse in the inline assembly but may be a bit easier to manage.
As I understand it the integrated assembler uses the -march from the clang invocation, no need to use -Wa.
Unfortunately, the arm64 kernel binary must also run on devices without LSE, so we can't enable it everywhere. The kernel enables it only for specific inline assembly blocks and patches itself at runtime to use LSE only if there's hardware support.
An alternative would be to add attribute((target("lse"))) to every function in the file
Using an attribute might have been feasible if adding it only to the functions that actually need LSE would work, but we seem to run into the exact same issue described in the upstream bug when using the attribute.
Apologies, I haven't got any more ideas. I think it is realistic for different functions within a section to have different subtargets, I think LTO relies upon that. To me it sounds improving the expression evaluation code to handle arbitrary numbers of fragments (assuming their size is known at the point in assembly time that the expression needs to be evaluated). I think following up where D69411 got stalled is probably the best starting point https://reviews.llvm.org/D69411#1742896
Thanks Peter. Will see if there is anything we can do regarding https://reviews.llvm.org/D69411.
@samitolvanen I did a little bit more digging, and realized the reasoning I did earlier in the upstream bug reported at https://github.com/ClangBuiltLinux/linux/issues/1078#issuecomment-656960018 was not completely accurate, although https://reviews.llvm.org/D69411 should still fix the problem. Anyway, while label 3 and 4 in function b of https://github.com/ClangBuiltLinux/linux/issues/1078#issuecomment-656245060 were indeed placed in different fragments, IAS could have resolved the differences if the two labels were in the same section, and the offset of the fragments including them respectively were known. In the case of .subsection, however, IAS did not know the offset of some fragments before the labels (still need further investigation) and therefore could not determine the offsets of the labels. With .push_section, this issue was gone because IAS created a new section, therefore did not need to calculate the offset of the previous fragments of the labels.
Looks like the problematic patch has been backported to several stable trees, which will break some downstream trees with LTO.
With .push_section, this issue was gone because IAS created a new section
Doesn't .subsection
create a new section?
https://sourceware.org/binutils/docs/as/SubSection.html#SubSection
oh, interesting, how does a subsection differ from a section?
@nickdesaulniers
Doesn't
.subsection
create a new section? https://sourceware.org/binutils/docs/as/SubSection.html#SubSection
The link says "The current section is not changed" :). From what I understand so far, when IAS sees a .subsection with a name for the first time, it creates a data fragment and inserts it to the list of fragments of the current section, and updates the insert point of the list to be the new fragment. All the subsequent regular fragments (not in any subsections) of the current section are inserted before the insert point, which explains why IAS failed to calculate the differences of 3 and 4 in function b, because the .org directive that included the subtraction were placed in a fragment before the fragments that 3 and 4 were emitted to in subsection 1. When IAS wrote data out to the object file, it was supposed to know the offsets of the two fragments so it could resolve the difference, and it failed because the fragments had not been processed yet.
I am trying to decide what will happen when there are multiple subsections, and if all the regular fragments are always placed before fragments of subsections. If they are mixed, maybe we can refactor IAS' code to make sure they are separated, so that we can make https://reviews.llvm.org/D69411 work when two fragments are in the same subsection or neither in any subsection. This way we would have resolved the difference while parsing the code and avoided this problem later while writing object files.
These .org
statements don't actually change anything in the final binary, their purpose is to ensure that both alternative instruction sequences have the same length. We could work around this trivially by moving them inside the subsection, which allows IAS to figure out the relative locations again as the layout of all the relevant fragments has been finalized. Thoughts?
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..6c9eba8cc40b 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -77,9 +77,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
"663:\n\t" \
newinstr "\n" \
"664:\n\t" \
- ".previous\n\t" \
".org . - (664b-663b) + (662b-661b)\n\t" \
".org . - (662b-661b) + (664b-663b)\n" \
+ ".previous\n\t" \
".endif\n"
#define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb) \
Nice! I am not familiar with kernel enough to make any suggestion, but this should prevent IAS from complaining.
https://git.kernel.org/arm64/c/966a0acce2fc
We should still make sure this also ends up in -stable trees.
We should still make sure this also ends up in -stable trees.
It's a lot easier to add
Cc: stable@vger.kernel.org
to the commit message than chase sending this once it lands in mainline, as per Documentation/process/stable-kernel-rules.rst.
Congrats on finding a relatively simple solution, and thanks @willdeacon for accepting the patch.
It's a lot easier to add
Which I had until Catalin instructed me to remove it, because it's apparently not needed. Who am I to argue.
I personally read Catalin's message as remove # 4.14+
from Cc: <stable@vger.kernel.org> # 4.14+
, replacing it with a Fixes:
tag, rather than getting rid of the stable tag altogether but that might have just been me.
In other words, Cc: <stable@vger.kernel.org>
plus Fixes:
is usually better than specifying an explicit version because the stable maintainers will look for Fixes:
tags when backporting commits so that:
All regression fixes for a patch are picked up (in case the commit being fixed is backported to an older version later like 4.4 or 4.9)
They know how far back it should go automatically, without needing to spam people with the "this commit does not apply to this tree" email.
Some maintainers think that just a Fixes:
tag is enough to get stuff backported, which is not the case per the documentation but realistically, Sasha's machine learning has been picking up commits of that nature.
Luckily, it is not like this is too much of a problem, we can just request that 966a0acce2fc be added to stable when it hits mainline due to a miscommunication around the stable tag :)
I personally read Catalin's message as remove
# 4.14+
fromCc: <stable@vger.kernel.org> # 4.14+
, replacing it with aFixes:
tag, rather than getting rid of the stable tag altogether but that might have just been me.
I was referring to the "If Will picks it up for 5.8, it doesn't even need a cc stable." part. But yes, I agree that this is not what the stable documentation says.
Which I had until Catalin instructed me to remove it, because it's apparently not needed. Who am I to argue.
LOL, right, sorry.
The fix is now in the stable queue for all relevant kernel versions, so closing the issue.
Fixed on LLVM upstream: https://reviews.llvm.org/rG415a4fbea7c1a39c780caa3cb7287fe09c5267d2.
Upstream commit f7b93d42945cc71e1346dd5ae07c59061d56745e ("arm64/alternatives: use subsections for replacement sequences") broke the integrated assembler, which is also needed for LTO: