samitolvanen commented 4 years ago

Upstream commit f7b93d42945cc71e1346dd5ae07c59061d56745e ("arm64/alternatives: use subsections for replacement sequences") broke the integrated assembler, which is also needed for LTO:

...
In file included from arch/arm64/kernel/debug-monitors.c:10:
In file included from ./include/linux/cpu.h:17:
In file included from ./include/linux/node.h:18:
In file included from ./include/linux/device.h:15:
In file included from ./include/linux/dev_printk.h:16:
In file included from ./include/linux/ratelimit.h:6:
In file included from ./include/linux/sched.h:15:
In file included from ./include/linux/sem.h:5:
In file included from ./include/uapi/linux/sem.h:5:
In file included from ./include/linux/ipc.h:7:
In file included from ./include/linux/rhashtable-types.h:15:
In file included from ./include/linux/workqueue.h:9:
In file included from ./include/linux/timer.h:6:
In file included from ./include/linux/ktime.h:24:
In file included from ./include/linux/time.h:74:
In file included from ./include/linux/time32.h:13:
In file included from ./include/linux/timex.h:65:
In file included from ./arch/arm64/include/asm/timex.h:8:
In file included from ./arch/arm64/include/asm/arch_timer.h:18:
In file included from ./include/linux/smp.h:82:
In file included from ./arch/arm64/include/asm/smp.h:28:
./arch/arm64/include/asm/percpu.h:115:1: error: expected assembly-time absolute expression
PERCPU_RET_OP(add, add, ldadd)
^
./arch/arm64/include/asm/percpu.h:105:2: note: expanded from macro 'PERCPU_RET_OP'
        __PERCPU_RET_OP_CASE(w,  , name, 32, op_llsc, op_lse)           \
        ^
./arch/arm64/include/asm/percpu.h:79:16: note: expanded from macro '__PERCPU_RET_OP_CASE'
        asm volatile (ARM64_LSE_ATOMIC_INSN(                            \
                      ^
./arch/arm64/include/asm/lse.h:37:2: note: expanded from macro 'ARM64_LSE_ATOMIC_INSN'
        ALTERNATIVE(llsc, __LSE_PREAMBLE lse, ARM64_HAS_LSE_ATOMICS)
        ^
./arch/arm64/include/asm/alternative.h:296:2: note: expanded from macro 'ALTERNATIVE'
        _ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
        ^
./arch/arm64/include/asm/alternative.h:98:2: note: expanded from macro '_ALTERNATIVE_CFG'
        __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
        ^
./arch/arm64/include/asm/alternative.h:80:14: note: expanded from macro '__ALTERNATIVE_CFG'
        ".previous\n\t"                                                 \
                    ^
<inline asm>:26:7: note: instantiated into assembly here
        .org    . - (664b-663b) + (662b-661b)
                ^

nickdesaulniers commented 4 years ago

cc @jcai19

samitolvanen commented 4 years ago

Edit: See below. This produces the same error, but for a different reason.

Stand-alone reproducer:

$ cat test.c 
void a(void) {
        asm(
        ".subsection 1  \n"
        "1:             \n"
        "nop            \n"
        "2:             \n"
        ".previous      \n"
        ".org . + 2b-1b \n"
        );
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test.c
test.c:8:3: error: expected assembly-time absolute expression
        ".org . + 2b-1b \n"
         ^
<inline asm>:6:6: note: instantiated into assembly here
.org . + 2b-1b  
     ^
test.c:8:3: error: expected assembly-time absolute expression
        ".org . + 2b-1b \n"
         ^
<inline asm>:6:6: note: instantiated into assembly here
.org . + 2b-1b  
     ^
2 errors generated.

Not entirely sure why it prints out seemingly the same error twice though.

jcai19 commented 4 years ago

Thanks for the reproducer! A slightly simplified example:

$ cat foo.s 1: 2: .org . + 2b-1b

$ llvm-mc -filetype=obj -triple=arm64 foo.s -o foo.o foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:3:6: error: expected assembly-time absolute expression .org . + 2b-1b ^

if I put parentheses around 2b and 1b respectively as follows, then it assembled. $ cat foo.s 1: 2: .org . + (2b-1b)

Seems like IAS evaluates the operands in order and therefore fails to evaluate the first two operands. Will see if I can come up with a quick fix.

samitolvanen commented 4 years ago

Adding parenthesis fixes the reproducer, but the kernel has them already, so I wonder if creduce found a different problem...:

#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled)     \
        ".if "__stringify(cfg_enabled)" == 1\n"                         \
        "661:\n\t"                                                      \
        oldinstr "\n"                                                   \
        "662:\n"                                                        \
        ".pushsection .altinstructions,\"a\"\n"                         \
        ALTINSTR_ENTRY(feature)                                         \
        ".popsection\n"                                                 \
        ".subsection 1\n"                                               \
        "663:\n\t"                                                      \
        newinstr "\n"                                                   \
        "664:\n\t"                                                      \
        ".previous\n\t"                                                 \
        ".org   . - (664b-663b) + (662b-661b)\n\t"                      \
        ".org   . - (662b-661b) + (664b-663b)\n"                        \
        ".endif\n"

jcai19 commented 4 years ago

I think kernel has parentheses around each operand in this case, i.e. .org - (Expr1) + (Expr2). I haven't verified but adding one more pair of parentheses should fix the problem, .org - ((Expr1) + (Expr2)).

samitolvanen commented 4 years ago

I tested this change:

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..94c859e95f01 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -78,8 +78,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
        newinstr "\n"                                                   \
        "664:\n\t"                                                      \
        ".previous\n\t"                                                 \
-       ".org   . - (664b-663b) + (662b-661b)\n\t"                      \
-       ".org   . - (662b-661b) + (664b-663b)\n"                        \
+       ".org   . - ((664b-663b) + (662b-661b))\n\t"                    \
+       ".org   . - ((662b-661b) + (664b-663b))\n"                      \
        ".endif\n"

 #define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb)       \

And now I'm getting different errors:

In file included from init/main.c:17:
In file included from ./include/linux/module.h:13:
In file included from ./include/linux/stat.h:6:
In file included from ./arch/arm64/include/asm/stat.h:12:
In file included from ./include/linux/time.h:6:
In file included from ./include/linux/seqlock.h:36:
In file included from ./include/linux/spinlock.h:54:
In file included from ./include/linux/irqflags.h:16:
./arch/arm64/include/asm/irqflags.h:56:15: error: invalid .org offset '3608' (at offset '3616')
        asm volatile(ALTERNATIVE(
                     ^
./arch/arm64/include/asm/alternative.h:296:2: note: expanded from macro 'ALTERNATIVE'
        _ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
        ^
./arch/arm64/include/asm/alternative.h:98:2: note: expanded from macro '_ALTERNATIVE_CFG'
        __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg))
        ^
./arch/arm64/include/asm/alternative.h:80:14: note: expanded from macro '__ALTERNATIVE_CFG'
        ".previous\n\t"                                                 \
                    ^
<inline asm>:26:7: note: instantiated into assembly here
        .org    . - ((664b-663b) + (662b-661b))
                ^

jcai19 commented 4 years ago

Thanks for the verification. Based on IAS's code the extra parentheses did work and made the assembling proceeded a little bit further. Will take a look at the new error.

samitolvanen commented 4 years ago

Note that the commit that tripped this bug didn't actually touch the .org statements, it just changed .pushsection / .popsection to .subsection / .previous.

nickdesaulniers commented 4 years ago

Right, so is .subsection handled differently than .pushsection in LLVM IAS? What's the difference between those directives?

jcai19 commented 4 years ago

So if I replace .subsection and .previous with .pushsection and .popsection respectively in the reproducer, the issue still happens. I guess what I do not understand is how come the same issue was not exposed before the change? Did the change alter program semantics?

$ cat foo.s

.subsection 1

.pushsection .altinstr_replacement, "a" 1: nop 2:

.previous

.popsection .org . + 2b-1b

$ llvm-mc -filetype=obj -triple=arm64 foo.s -o clang.o foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b ^ foo.s:8:6: error: expected assembly-time absolute expression .org . + 2b-1b

So if understand correctly, the original code placed the code within .pushsection/.popsection pair into a new section (.altinstr_replacement), while the newer version puts the same code into a new subsection of the current section. So it appears that somewhere IAS failed to update the offset within the section and caused the assertion failure. It however still does not explain why the old code worked. Anyway, will keep investigating.

samitolvanen commented 4 years ago

My initial interestingness test for creduce was too simple. The reproducer fails with the same error, but for a different reason. Here's the full preprocessed arch/arm64/kernel/debug-monitors.c, which is one of the files that fails to build.

jcai19 commented 4 years ago

Thanks for the full reproducer. I just noticed the change should be

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..94c859e95f01 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -78,8 +78,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
        newinstr "\n"                                                   \
        "664:\n\t"                                                      \
        ".previous\n\t"                                                 \
-       ".org   . - (664b-663b) + (662b-661b)\n\t"                      \
-       ".org   . - (662b-661b) + (664b-663b)\n"                        \
+       ".org   . - ((664b-663b) - (662b-661b))\n\t"                    \
+       ".org   . - ((662b-661b) - (664b-663b))\n"                      \
        ".endif\n"

 #define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb)       \

This probably explains the new error you saw, although it still failed and went back to the reported error. Still investigating.

samitolvanen commented 4 years ago

Ah, good catch!

I think this might be related to the .arch_extension lse directive. Here's another attempt at a reproducer:

$ cat test.c
void a(void) {
        asm(
        ".subsection 1                  \n"
        "1:                             \n"
        "nop                            \n"
        "2:                             \n"
        ".previous                      \n"
        "3:                             \n"
        "nop                            \n"
        "4:                             \n"
        ".subsection 1                  \n"
        "5:                             \n"
        ".arch_extension lse            \n"
        "nop                            \n"
        "6:                             \n"
        ".previous                      \n"
        ".org . - (6b-5b) + (4b-3b)     \n"
        );
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test.c
test.c:17:10: error: expected assembly-time absolute expression
        ".org . - (6b-5b) + (4b-3b)     \n"
         ^
<inline asm>:15:6: note: instantiated into assembly here
.org . - (6b-5b) + (4b-3b)      
     ^
1 error generated.

This builds if I remove .arch_extension lse, or replace .subsection/previous with .pushsection/popsection:

$ cat test1.c 
void a(void) {
        asm(
        ".pushsection .mysection,\"a\"  \n"
        "1:                             \n"
        "nop                            \n"
        "2:                             \n"
        ".popsection                    \n"
        "3:                             \n"
        "nop                            \n"
        "4:                             \n"
        ".pushsection .mysection,\"a\"  \n"
        "5:                             \n"
        ".arch_extension lse            \n"
        "nop                            \n"
        "6:                             \n"
        ".popsection                    \n"
        ".org . - (6b-5b) + (4b-3b)     \n"
        );
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test1.c
$ llvm-objdump -D test1.o
...
Disassembly of section .text:

0000000000000000 <a>:
       0: 1f 20 03 d5   nop
       4: c0 03 5f d6   ret

Disassembly of section .mysection:

0000000000000000 <$x.0>:
       0: 1f 20 03 d5   nop

0000000000000004 <$x.1>:
       4: 1f 20 03 d5   nop
...
$ cat test2.c
void a(void) {
        asm(
        ".subsection 1                  \n"
        "1:                             \n"
        "nop                            \n"
        "2:                             \n"
        ".previous                      \n"
        "3:                             \n"
        "nop                            \n"
        "4:                             \n"
        ".subsection 1                  \n"
        "5:                             \n"
        "nop                            \n"
        "6:                             \n"
        ".previous                      \n"
        ".org . - (6b-5b) + (4b-3b)     \n"
        );
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test2.c
$ llvm-objdump -D test2.o
...
Disassembly of section .text:

0000000000000000 <a>:
       0: 1f 20 03 d5   nop
       4: c0 03 5f d6   ret

0000000000000008 <$x.0>:
       8: 1f 20 03 d5   nop
       c: 1f 20 03 d5   nop
...

jcai19 commented 4 years ago

Awesome. Thanks for reducing the reproducer. Just want to mention here for future references that I played with IAS a bit more and realized it had nothing to do with parentheses. The existing ones should be enough.

jcai19 commented 4 years ago

This actually reminders me of https://github.com/ClangBuiltLinux/linux/issues/742. IAS implements each section as a list of data blocks called fragments. Currently it can only resolve the differences of labels in the same fragment. .arch_extension creates a new fragment, and therefore causing IAS to fail. I have a LLVM patch ( https://reviews.llvm.org/D69411) from a while ago to add support to resolve the difference of labels if the fragments they belong to are separated by fix-sized fragments only (the size of some fragments may be changed, such as those including jump instructions). There were concerns that the list of fragments may be changed after the differences are resolved, so the patch was never submitted. I guess I will have to revisit the patch and see if there is anything we can do. Meanwhile, a possible workaround could be to move .arch_extension to the beginning of the code, if that's possible.

samitolvanen commented 4 years ago

Meanwhile, a possible workaround could be to move .arch_extension to the beginning of the code, if that's possible.

Moving .arch_extension lse to the beginning of the assembly block fixes the reproducer, but the kernel still fails to build. It looks like something is broken when we have multiple functions with the same assembly construct:

$ cat test3.c
void a(void)
{ 
        asm(
        ".arch_extension lse            \n"
        "1:                             \n"
        "nop                            \n"
        "2:                             \n"
        ".subsection 1                  \n"
        "3:                             \n"
        "nop                            \n"
        "4:                             \n"
        ".previous                      \n"
        ".org . - (4b-3b) + (2b-1b)     \n"
        );
}
/* identical to a() */
void b(void)
{
        asm(
        ".arch_extension lse            \n"
        "1:                             \n"
        "nop                            \n"
        "2:                             \n"
        ".subsection 1                  \n"
        "3:                             \n"
        "nop                            \n"
        "4:                             \n"
        ".previous                      \n"
        ".org . - (4b-3b) + (2b-1b)     \n"
        );
}
$ clang --target=aarch64-linux-gnu -integrated-as -c test3.c
test3.c:29:3: error: expected assembly-time absolute expression
        ".org . - (4b-3b) + (2b-1b)     \n"
         ^
<inline asm>:10:6: note: instantiated into assembly here
.org . - (4b-3b) + (2b-1b)      
     ^
1 error generated.

Removing one of the functions fixes the issue, and so does removing the .arch_extension directives. I assume this is the same bug, but I'm not sure how we can work around this in the kernel.

jcai19 commented 4 years ago

Thanks. The given example assembles if I first compile it into an assembly file and then use llvm-mc to do the assembling. Don't know if the assembly was generated correctly, but if the answer is yes, then it seems the handling of inline assembly is problematic in this case.

$ clang --target=aarch64-linux-gnu -c test3.c -o test3.o
test3.c:29:10: error: expected assembly-time absolute expression
        ".org . - (14b-13b) + (12b-11b)     \n"
         ^
<inline asm>:10:6: note: instantiated into assembly here
.org . - (14b-13b) + (12b-11b)     
     ^

$ clang --target=aarch64-linux-gnu -S test3.c -o test3.s
$ cat test3.s 
    .text
    .file   "test3.c"
    .globl  a                       // -- Begin function a
    .p2align    2
    .type   a,@function
a:                                      // @a
// %bb.0:                               // %entry
    //APP
.Ltmp0:
    nop
.Ltmp1:
    .text   1
.Ltmp2:
    nop
.Ltmp3:
    .text

.Ltmp4:
.org (.Ltmp4-(.Ltmp3-.Ltmp2))+(.Ltmp1-.Ltmp0), 0

    //NO_APP
    ret
.Lfunc_end0:
    .size   a, .Lfunc_end0-a
                                        // -- End function
    .globl  b                       // -- Begin function b
    .p2align    2
    .type   b,@function
b:                                      // @b
// %bb.0:                               // %entry
    //APP
.Ltmp5:
    nop
.Ltmp6:
    .text   1
.Ltmp7:
    nop
.Ltmp8:
    .text

.Ltmp9:
.org (.Ltmp9-(.Ltmp8-.Ltmp7))+(.Ltmp6-.Ltmp5), 0

    //NO_APP
    ret
.Lfunc_end1:
    .size   b, .Lfunc_end1-b
                                        // -- End function
    .ident  "clang version 11.0.0 (https://github.com/llvm/llvm-project.git b925ca37a8f28851e110b8f0cd6e7f9a36a15d65)"
    .section    ".note.GNU-stack","",@progbits
    .addrsig
$ llvm-mc -filetype=obj -triple=arm64 test3.s -o test3.o
$ llvm-objdump -d test3.o

test3.o:    file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <a>:
       0: 1f 20 03 d5   nop
       4: c0 03 5f d6   ret

0000000000000008 <b>:
       8: 1f 20 03 d5   nop
       c: c0 03 5f d6   ret
      10: 1f 20 03 d5   nop
      14: 1f 20 03 d5   nop

samitolvanen commented 4 years ago

There are no .arch_extension lse directives in the generated assembly code. Perhaps because there are actually no LSE instructions in the example?

jcai19 commented 4 years ago

I'd guess so. The generated assembly looks reasonable to me.

jcai19 commented 4 years ago

Removing either of the two directives the given example did not solve the problem. So I dug a little bit more and found that when IAS parsed .arch_extension, it created a new subtarget info object. However, when creating the fragment for the nop instruction between label 3 and 4 in function b, the old subtarget info object was used, and on detecting a different subtarget info created by .arch_extension, it created a new fragment, causing label 3 and 4 to be placed into two separate fragments. And because the subtarget info was associated with the assembly parser, this issue would happen with either .arch_extension. Still trying to locate the place that firstly passed the old subtarget info object.

jcai19 commented 4 years ago

cc @smithp35

I have identified the cause, and put my analysis in the upstream bug I reported. Not sure if there is an easy fix on LLVM though.

smithp35 commented 4 years ago

This is where I look at a combination of .arch multiple functions, subsections and org and start to cry.

To some extent this looks like a combination of things. The first is that in an assembler file .arch applies to all fragments created from that point on in the file. The compiler generates a subtarget per function and is unaware of the .arch in the inline assembler. I don't think the behaviour of .arch when used in inline assembly, when there are multiple functions in the same section is well specified, if specified at all.

One interesting experiment is that we can get this to assemble with -ffunction-sections although the meaning of subsection changes here as subsection 1 is different in both cases. It may be possible (I don't know the original context) to rewrite using attribute section. I'd have thought that code that assembles to semantically different output when -ffunction-sections is used is probably best avoided, but again this may be required for the original code to work.

I think that your original patch https://reviews.llvm.org/D69411 would have been able to evaluate the expression even with different subtargets. It may be worth restarting it. I seem to remember James Knight wanting to do some refactoring to make sure that it worked with subsections.

Apologies I haven't got a lot of spare time to look myself, only weekends and evenings. Will do my best if I can spot anything obvious.

samitolvanen commented 4 years ago

One interesting experiment is that we can get this to assemble with -ffunction-sections although the meaning of subsection changes here as subsection 1 is different in both cases

Looks like -ffunction-sections doesn't fix the kernel build though (using debug-monitors.i from an earlier comment):

$ clang --target=aarch64-linux-gnu -O2 -ffunction-sections -c debug-monitors.i
...
<inline asm>:27:7: note: instantiated into assembly here
        .org    . - (662b-661b) + (664b-663b)
                ^

Switching from subsections to actual sections is the only workaround I've found.

Is there a way to pass arguments to the integrated assembler? It should be fine to enable LSE for all inline assembly instead of sprinkling .arch_extension lse directives in the code. However, -Wa,-march=... etc. seem to be ignored.

smithp35 commented 4 years ago

As I understand it the integrated assembler uses the -march from the clang invocation, no need to use -Wa. An alternative would be to add attribute((target("lse"))) to every function in the file. This is notionally the same as .arch_extension lse in the inline assembly but may be a bit easier to manage.

samitolvanen commented 4 years ago

As I understand it the integrated assembler uses the -march from the clang invocation, no need to use -Wa.

Unfortunately, the arm64 kernel binary must also run on devices without LSE, so we can't enable it everywhere. The kernel enables it only for specific inline assembly blocks and patches itself at runtime to use LSE only if there's hardware support.

An alternative would be to add attribute((target("lse"))) to every function in the file

Using an attribute might have been feasible if adding it only to the functions that actually need LSE would work, but we seem to run into the exact same issue described in the upstream bug when using the attribute.

smithp35 commented 4 years ago

Apologies, I haven't got any more ideas. I think it is realistic for different functions within a section to have different subtargets, I think LTO relies upon that. To me it sounds improving the expression evaluation code to handle arbitrary numbers of fragments (assuming their size is known at the point in assembly time that the expression needs to be evaluated). I think following up where D69411 got stalled is probably the best starting point https://reviews.llvm.org/D69411#1742896

jcai19 commented 4 years ago

Thanks Peter. Will see if there is anything we can do regarding https://reviews.llvm.org/D69411.

jcai19 commented 4 years ago

@samitolvanen I did a little bit more digging, and realized the reasoning I did earlier in the upstream bug reported at https://github.com/ClangBuiltLinux/linux/issues/1078#issuecomment-656960018 was not completely accurate, although https://reviews.llvm.org/D69411 should still fix the problem. Anyway, while label 3 and 4 in function b of https://github.com/ClangBuiltLinux/linux/issues/1078#issuecomment-656245060 were indeed placed in different fragments, IAS could have resolved the differences if the two labels were in the same section, and the offset of the fragments including them respectively were known. In the case of .subsection, however, IAS did not know the offset of some fragments before the labels (still need further investigation) and therefore could not determine the offsets of the labels. With .push_section, this issue was gone because IAS created a new section, therefore did not need to calculate the offset of the previous fragments of the labels.

nathanchance commented 4 years ago

Looks like the problematic patch has been backported to several stable trees, which will break some downstream trees with LTO.

nickdesaulniers commented 4 years ago

With .push_section, this issue was gone because IAS created a new section

Doesn't .subsection create a new section? https://sourceware.org/binutils/docs/as/SubSection.html#SubSection oh, interesting, how does a subsection differ from a section?

jcai19 commented 4 years ago

@nickdesaulniers

Doesn't .subsection create a new section? https://sourceware.org/binutils/docs/as/SubSection.html#SubSection

The link says "The current section is not changed" :). From what I understand so far, when IAS sees a .subsection with a name for the first time, it creates a data fragment and inserts it to the list of fragments of the current section, and updates the insert point of the list to be the new fragment. All the subsequent regular fragments (not in any subsections) of the current section are inserted before the insert point, which explains why IAS failed to calculate the differences of 3 and 4 in function b, because the .org directive that included the subtraction were placed in a fragment before the fragments that 3 and 4 were emitted to in subsection 1. When IAS wrote data out to the object file, it was supposed to know the offsets of the two fragments so it could resolve the difference, and it failed because the fragments had not been processed yet.

I am trying to decide what will happen when there are multiple subsections, and if all the regular fragments are always placed before fragments of subsections. If they are mixed, maybe we can refactor IAS' code to make sure they are separated, so that we can make https://reviews.llvm.org/D69411 work when two fragments are in the same subsection or neither in any subsection. This way we would have resolved the difference while parsing the code and avoided this problem later while writing object files.

samitolvanen commented 4 years ago

These .org statements don't actually change anything in the final binary, their purpose is to ensure that both alternative instruction sequences have the same length. We could work around this trivially by moving them inside the subsection, which allows IAS to figure out the relative locations again as the layout of all the relevant fragments has been finalized. Thoughts?

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 12f0eb56a1cc..6c9eba8cc40b 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -77,9 +77,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
        "663:\n\t"                                                      \
        newinstr "\n"                                                   \
        "664:\n\t"                                                      \
-       ".previous\n\t"                                                 \
        ".org   . - (664b-663b) + (662b-661b)\n\t"                      \
        ".org   . - (662b-661b) + (664b-663b)\n"                        \
+       ".previous\n\t"                                                 \
        ".endif\n"

 #define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb)       \

jcai19 commented 4 years ago

Nice! I am not familiar with kernel enough to make any suggestion, but this should prevent IAS from complaining.

samitolvanen commented 4 years ago

https://lore.kernel.org/lkml/20200729215152.662225-1-samitolvanen@google.com/

samitolvanen commented 4 years ago

https://git.kernel.org/arm64/c/966a0acce2fc

We should still make sure this also ends up in -stable trees.

nickdesaulniers commented 4 years ago

We should still make sure this also ends up in -stable trees.

It's a lot easier to add

Cc: stable@vger.kernel.org

to the commit message than chase sending this once it lands in mainline, as per Documentation/process/stable-kernel-rules.rst.

Congrats on finding a relatively simple solution, and thanks @willdeacon for accepting the patch.

samitolvanen commented 4 years ago

It's a lot easier to add

Cc: stable@vger.kernel.org

Which I had until Catalin instructed me to remove it, because it's apparently not needed. Who am I to argue.

nathanchance commented 4 years ago

I personally read Catalin's message as remove # 4.14+ from Cc: <stable@vger.kernel.org> # 4.14+, replacing it with a Fixes: tag, rather than getting rid of the stable tag altogether but that might have just been me.

In other words, Cc: <stable@vger.kernel.org> plus Fixes: is usually better than specifying an explicit version because the stable maintainers will look for Fixes: tags when backporting commits so that:

All regression fixes for a patch are picked up (in case the commit being fixed is backported to an older version later like 4.4 or 4.9)
They know how far back it should go automatically, without needing to spam people with the "this commit does not apply to this tree" email.

Some maintainers think that just a Fixes: tag is enough to get stuff backported, which is not the case per the documentation but realistically, Sasha's machine learning has been picking up commits of that nature.

Luckily, it is not like this is too much of a problem, we can just request that 966a0acce2fc be added to stable when it hits mainline due to a miscommunication around the stable tag :)

samitolvanen commented 4 years ago

I personally read Catalin's message as remove # 4.14+ from Cc: <stable@vger.kernel.org> # 4.14+, replacing it with a Fixes: tag, rather than getting rid of the stable tag altogether but that might have just been me.

I was referring to the "If Will picks it up for 5.8, it doesn't even need a cc stable." part. But yes, I agree that this is not what the stable documentation says.

nickdesaulniers commented 4 years ago

Which I had until Catalin instructed me to remove it, because it's apparently not needed. Who am I to argue.

LOL, right, sorry.

samitolvanen commented 4 years ago

The fix is now in the stable queue for all relevant kernel versions, so closing the issue.

jcai19 commented 4 years ago

Fixed on LLVM upstream: https://reviews.llvm.org/rG415a4fbea7c1a39c780caa3cb7287fe09c5267d2.

ClangBuiltLinux / linux

arm64: integrated assembler fails with "error: expected assembly-time absolute expression" #1078

.subsection 1

.previous