Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

section flag 'o' (SHF_LINK_ORDER) is incorrectly handled #43745

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR44775
Status CONFIRMED
Importance P normal
Reported by H.J. Lu (hjl.tools@gmail.com)
Reported on 2020-02-04 11:10:17 -0800
Last modified on 2020-02-05 10:41:59 -0800
Version 9.0
Hardware PC Linux
CC bd1976llvm@gmail.com, htmldeveloper@gmail.com, i@maskray.me, llvm-bugs@lists.llvm.org, smithp352@googlemail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
[hjl@gnu-cfl-1 lld-1]$ cat x.s
    .section .text,"ax",%progbits,unique,1
    .globl _start
_start:
    .dc.a   xxx

    .section .text,"ax",%progbits,unique,2
xxx:
    .byte 0

    .section .text,"ax",%progbits,unique,3
yyy:
    .byte 0

    .section .foo,"ao",%progbits,yyy
    .dc.a   0
    .section .foo,"ao",%progbits,xxx
    .dc.a   0
[hjl@gnu-cfl-1 lld-1]$ make llvm.o
llvm-mc -filetype=obj -triple=x86_64 -o llvm.o x.s
[hjl@gnu-cfl-1 lld-1]$ readelf -SW llvm.o
There are 9 section headers, starting at offset 0x120:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 0000f0 000030 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] .text             PROGBITS        0000000000000000 000040 000008 00  AX  0   0  1
  [ 4] .rela.text        RELA            0000000000000000 0000d8 000018 18      8   3  8
  [ 5] .text             PROGBITS        0000000000000000 000048 000001 00  AX  0   0  1
  [ 6] .text             PROGBITS        0000000000000000 000049 000001 00  AX  0   0  1
  [ 7] .foo              PROGBITS        0000000000000000 00004a 000010 00  AL  6   0  1
  [ 8] .symtab           SYMTAB          0000000000000000 000060 000078 18      1   4  8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
[hjl@gnu-cfl-1 lld-1]$

Only one section foo was generated and I got

[hjl@gnu-cfl-1 lld-1]$ make y.lld
ld.lld --gc-sections -o y.lld llvm.o
[hjl@gnu-cfl-1 lld-1]$ readelf -SW y.lld
There are 6 section headers, starting at offset 0x2098:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000201000 001000 000009 00  AX  0   0  1
  [ 2] .comment          PROGBITS        0000000000000000 002000 000012 01  MS  0   0  1
  [ 3] .symtab           SYMTAB          0000000000000000 002018 000048 18      5   2  8
  [ 4] .shstrtab         STRTAB          0000000000000000 002060 00002a 00      0   0  1
  [ 5] .strtab           STRTAB          0000000000000000 00208a 00000c 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
[hjl@gnu-cfl-1 lld-1]$

Section foo is missing.  I am expecting:

[hjl@gnu-cfl-1 lld-1]$ make x.o
as  -o x.o x.s
[hjl@gnu-cfl-1 lld-1]$ readelf -SW x.o
There are 14 section headers, starting at offset 0x240:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000040 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000040 000000 00  WA  0   0  1
  [ 4] .text             PROGBITS        0000000000000000 000040 000008 00  AX  0   0  1
  [ 5] .rela.text        RELA            0000000000000000 0001d8 000018 18   I 11   4  8
  [ 6] .text             PROGBITS        0000000000000000 000048 000001 00  AX  0   0  1
  [ 7] .text             PROGBITS        0000000000000000 000049 000001 00  AX  0   0  1
  [ 8] .foo              PROGBITS        0000000000000000 00004a 000008 00  AL  7   0  1
  [ 9] .foo              PROGBITS        0000000000000000 000052 000008 00  AL  6   0  1
  [10] .note.gnu.property NOTE            0000000000000000 000060 000030 00   A  0   0  8
  [11] .symtab           SYMTAB          0000000000000000 000090 000138 18     12  12  8
  [12] .strtab           STRTAB          0000000000000000 0001c8 000010 00      0   0  1
  [13] .shstrtab         STRTAB          0000000000000000 0001f0 000049 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
[hjl@gnu-cfl-1 lld-1]$ make x.lld
ld.lld --gc-sections -o x.lld x.o
[hjl@gnu-cfl-1 lld-1]$ readelf -SW x.lld
There are 7 section headers, starting at offset 0x20a0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .foo              PROGBITS        0000000000200120 000120 000008 00  AL  2   0  1
  [ 2] .text             PROGBITS        0000000000201000 001000 000009 00  AX  0   0  1
  [ 3] .comment          PROGBITS        0000000000000000 002000 000012 01  MS  0   0  1
  [ 4] .symtab           SYMTAB          0000000000000000 002018 000048 18      6   2  8
  [ 5] .shstrtab         STRTAB          0000000000000000 002060 00002f 00      0   0  1
  [ 6] .strtab           STRTAB          0000000000000000 00208f 00000c 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
[hjl@gnu-cfl-1 lld-1]$
Quuxplusone commented 4 years ago
cat > a.s <<e
.section .text,"ax",%progbits
.byte 0
.section .text,"awx",%progbits
.byte 1
.section .text,"awx",%progbits
.byte 2
e

% as a.s
a.s: Assembler messages:
a.s:3: Warning: ignoring changed section attributes for .text
a.s:5: Warning: ignoring changed section attributes for .text

I think llvm-mc should emit a warning.

For
    .section .foo,"ao",%progbits,yyy
    .dc.a   0
    .section .foo,"ao",%progbits,xxx
    .dc.a   0

gas/config/obj-elf.c:obj_elf_change_section should also emit a warning that
"o"+xxx is ignored.
Quuxplusone commented 4 years ago
.section .text,"axG",%progbits,foo,comdat
foo:
.byte 0
.section .text,"axG",%progbits,bar,comdat
bar:
.byte 1

Both GNU as and llvm-mc create two ".text" sections.
Both section name and group name are considered.

static bfd_boolean
get_section (bfd *abfd ATTRIBUTE_UNUSED, asection *sec, void *inf)
{
  struct elf_section_match *match = (struct elf_section_match *) inf;
  const char *gname = match->group_name;
  const char *group_name = elf_group_name (sec);
  unsigned int info = elf_section_data (sec)->this_hdr.sh_info;

  return (info == match->info
      && ((bfd_section_flags (sec) & SEC_ASSEMBLER_SECTION_ID)
           == (match->flags & SEC_ASSEMBLER_SECTION_ID))
      && sec->section_id == match->section_id
      && (group_name == gname
          || (group_name != NULL
          && gname != NULL
          && strcmp (group_name, gname) == 0)));
}

It seems llvm-mc should take sh_link into consideration as well. I will work on
a patch.
Quuxplusone commented 4 years ago
(In reply to Fangrui Song from comment #2)
> .section .text,"axG",%progbits,foo,comdat
> foo:
> .byte 0
> .section .text,"axG",%progbits,bar,comdat
> bar:
> .byte 1
>
> Both GNU as and llvm-mc create two ".text" sections.
> Both section name and group name are considered.
>
> static bfd_boolean
> get_section (bfd *abfd ATTRIBUTE_UNUSED, asection *sec, void *inf)
> {
>   struct elf_section_match *match = (struct elf_section_match *) inf;
>   const char *gname = match->group_name;
>   const char *group_name = elf_group_name (sec);
>   unsigned int info = elf_section_data (sec)->this_hdr.sh_info;
>
>   return (info == match->info
>     && ((bfd_section_flags (sec) & SEC_ASSEMBLER_SECTION_ID)
>          == (match->flags & SEC_ASSEMBLER_SECTION_ID))
>     && sec->section_id == match->section_id
>     && (group_name == gname
>         || (group_name != NULL
>         && gname != NULL
>         && strcmp (group_name, gname) == 0)));
> }
>
> It seems llvm-mc should take sh_link into consideration as well. I will work
> on a patch.

My patch gas changed it to

static bfd_boolean
get_section (bfd *abfd ATTRIBUTE_UNUSED, asection *sec, void *inf)
{
  struct elf_section_match *match = (struct elf_section_match *) inf;
  const char *gname = match->group_name;
  const char *group_name = elf_group_name (sec);
  const char *linked_to_symbol_name
    = sec->map_head.linked_to_symbol_name;
  unsigned int info = elf_section_data (sec)->this_hdr.sh_info;

  return (info == match->info
          && ((bfd_section_flags (sec) & SEC_ASSEMBLER_SECTION_ID)
               == (match->flags & SEC_ASSEMBLER_SECTION_ID))
          && sec->section_id == match->section_id
          && (group_name == gname
              || (group_name != NULL
                  && gname != NULL
                  && strcmp (group_name, gname) == 0))
          && (linked_to_symbol_name == match->linked_to_symbol_name
              || (linked_to_symbol_name != NULL
                  && match->linked_to_symbol_name != NULL
                  && strcmp (linked_to_symbol_name,
                             match->linked_to_symbol_name) == 0)));
}
Quuxplusone commented 4 years ago
(In reply to Fangrui Song from comment #1)
>
> For
>   .section .foo,"ao",%progbits,yyy
>   .dc.a   0
>   .section .foo,"ao",%progbits,xxx
>   .dc.a   0
>
> gas/config/obj-elf.c:obj_elf_change_section should also emit a warning that
> "o"+xxx is ignored.

It is an error for gas:

[hjl@gnu-cfl-1 tmp]$ cat b.s
    .section .foo,"ao",%progbits,yyy
    .dc.a   0
    .section .foo,"ao",%progbits,xxx
    .dc.a   0
[hjl@gnu-cfl-1 tmp]$ as b.s
b.s: Assembler messages:
b.s: Error: undefined linked-to symbol `yyy' on section `.foo'
b.s: Error: undefined linked-to symbol `xxx' on section `.foo'
[hjl@gnu-cfl-1 tmp]$
Quuxplusone commented 4 years ago
Speaking too quickly...

SHF_LINK_ORDER + symbol is a bit different from SHF_GROUP+signature name.

A group signature is a string literal. It can easily used as a key
(differentiator). The symbol used by "o" needs resolving. For the example
below, shall we consider the second .section the same section?

.data
foo:
bar:
.section .text,"axo",%progbits,foo
.section .text,"axo",%progbits,bar

My current feeling is that we should use explicit syntax to make it clear that
the sections are different.

.section .text,"axo",%progbits,foo,unique,0
.section .text,"axo",%progbits,bar,unique,1
Quuxplusone commented 4 years ago
(In reply to Fangrui Song from comment #5)
> Speaking too quickly...
>
> SHF_LINK_ORDER + symbol is a bit different from SHF_GROUP+signature name.
>
> A group signature is a string literal. It can easily used as a key
> (differentiator). The symbol used by "o" needs resolving. For the example
> below, shall we consider the second .section the same section?
>
>
> .data
> foo:
> bar:
> .section .text,"axo",%progbits,foo
> .section .text,"axo",%progbits,bar

These may be 2 different sections, depending on where foo and bar
are defined. Gas generates 2 different sections.  Linker can handle it
correctly.

> My current feeling is that we should use explicit syntax to make it clear
> that the sections are different.
>
> .section .text,"axo",%progbits,foo,unique,0
> .section .text,"axo",%progbits,bar,unique,1

This isn't required.
Quuxplusone commented 4 years ago
From https://sourceware.org/ml/binutils/2020-02/msg00028.html

-         && strcmp (group_name, gname) == 0)));
+         && strcmp (group_name, gname) == 0))
+     && (linked_to_symbol_name == match->linked_to_symbol_name
+         || (linked_to_symbol_name != NULL
+         && match->linked_to_symbol_name != NULL
+         && strcmp (linked_to_symbol_name,
+                match->linked_to_symbol_name) == 0)));

It seems the patch just considered the symbol (linked_to_symbol) name, not the
referenced section.

> .data
> foo:
> bar:
> .section .text,"axo",%progbits,foo
> .section .text,"axo",%progbits,bar

So in the example, the two .section directives define different sections. I
think the semantic is fine.

> These may be 2 different sections, depending on where foo and bar
are defined. Gas generates 2 different sections.

I hope we can change "may" to "must"...

Different symbols -> different sections, even if their section is the same.
Same symbols -> same section
Quuxplusone commented 4 years ago
(In reply to Fangrui Song from comment #7)
> From https://sourceware.org/ml/binutils/2020-02/msg00028.html
>
> -       && strcmp (group_name, gname) == 0)));
> +       && strcmp (group_name, gname) == 0))
> +   && (linked_to_symbol_name == match->linked_to_symbol_name
> +       || (linked_to_symbol_name != NULL
> +       && match->linked_to_symbol_name != NULL
> +       && strcmp (linked_to_symbol_name,
> +              match->linked_to_symbol_name) == 0)));
>
> It seems the patch just considered the symbol (linked_to_symbol) name, not
> the referenced section.

The section is resolved at the very end:

  if (sec->map_head.linked_to_symbol_name)
    {
      symbolS *linked_to_sym;
      linked_to_sym = symbol_find (sec->map_head.linked_to_symbol_name);
      if (!linked_to_sym || !S_IS_DEFINED (linked_to_sym))
        as_bad (_("undefined linked-to symbol `%s' on section `%s'"),
                sec->map_head.linked_to_symbol_name,
                bfd_section_name (sec));
      else
        elf_linked_to_section (sec) = S_GET_SEGMENT (linked_to_sym);
    }

> > .data
> > foo:
> > bar:
> > .section .text,"axo",%progbits,foo
> > .section .text,"axo",%progbits,bar
>
> So in the example, the two .section directives define different sections. I
> think the semantic is fine.
>
> > These may be 2 different sections, depending on where foo and bar
> are defined. Gas generates 2 different sections.
>
> I hope we can change "may" to "must"...

Yes, this is what I implemented.

> Different symbols -> different sections, even if their section is the same.
> Same symbols -> same section

Yes.
Quuxplusone commented 4 years ago
> > > .data
> > > foo:
> > > bar:
> > > .section .text,"axo",%progbits,foo
> > > .section .text,"axo",%progbits,bar
> >
> > So in the example, the two .section directives define different sections. I
> > think the semantic is fine.
> >
> > > These may be 2 different sections, depending on where foo and bar
> > are defined. Gas generates 2 different sections.
> >
> > I hope we can change "may" to "must"...
>
> Yes, this is what I implemented.

> > Different symbols -> different sections, even if their section is the same.
> > Same symbols -> same section
>
>
> Yes.

Great! We are on the same page. Created https://reviews.llvm.org/D74006

For the missing diagnostics, created https://reviews.llvm.org/D73999

Though I did not the join the design of SHF_LINK_ORDER "o" "unique" etc,
I find them very useful.
The lack of the features on binutils side always made me a bit worried about
using such metadata sections more heavily.

Thanks a lot for taking the lead!

(Nick sent me an email that he did not forget --gc-sections feature requests
last November but it seems that he forgets it again...)
Quuxplusone commented 4 years ago
Hi H.J. Lu and Fangrui Song,

Thanks for your work on the unique sections extension.

I have been trying to use this extension to solve a problem in LLVM when
symbols with incompatible sh_entsizes are explicitly assigned to the same
section name, see https://bugs.llvm.org/show_bug.cgi?id=43457 and
https://reviews.llvm.org/D72194.

My question is why not simply use a new section each time we encounter a
section directive with a new set of section attributes. Using the example code
from comment 1.

cat > a.s <<e
.section .text,"ax",%progbits
.byte 0
.section .text,"awx",%progbits
.byte 1
.section .text,"awx",%progbits
.byte 2
e

GAS emits:

% as a.s
a.s: Assembler messages:
a.s:3: Warning: ignoring changed section attributes for .text
a.s:5: Warning: ignoring changed section attributes for .text

But, why not create 2 .text sections instead? This would simplify the fix for
https://bugs.llvm.org/show_bug.cgi?id=43457. It is also more intuitive
behaviour (IMO).
Quuxplusone commented 4 years ago
(In reply to Ben Dunbobbin from comment #10)
> Hi H.J. Lu and Fangrui Song,
>
> Thanks for your work on the unique sections extension.
>
> I have been trying to use this extension to solve a problem in LLVM when
> symbols with incompatible sh_entsizes are explicitly assigned to the same
> section name, see https://bugs.llvm.org/show_bug.cgi?id=43457 and
> https://reviews.llvm.org/D72194.
>
> My question is why not simply use a new section each time we encounter a
> section directive with a new set of section attributes. Using the example
> code from comment 1.
>
> cat > a.s <<e
> .section .text,"ax",%progbits
> .byte 0
> .section .text,"awx",%progbits
> .byte 1
> .section .text,"awx",%progbits
> .byte 2
> e
>
> GAS emits:
>
> % as a.s
> a.s: Assembler messages:
> a.s:3: Warning: ignoring changed section attributes for .text
> a.s:5: Warning: ignoring changed section attributes for .text
>

My comments are specific to .text.  .text is a special section and
must be "ax":

http://www.sco.com/developers/gabi/latest/ch4.sheader.html#special_sections
Quuxplusone commented 4 years ago
(In reply to H.J. Lu from comment #11)
> (In reply to Ben Dunbobbin from comment #10)
> > Hi H.J. Lu and Fangrui Song,
> >
> My comments are specific to .text.  .text is a special section and
> must be "ax":
>
> http://www.sco.com/developers/gabi/latest/ch4.sheader.html#special_sections

Right, sorry shouldn't have chosen a default section, bad example.

New example:

--- asm ----
.section .bob,"ax",%progbits
.byte 0
.section .bob,"awx",%progbits
.byte 1

--- disassembly ---
.bob:
 add    BYTE PTR [rcx],al

--- GAS output ----
./example.asm: Assembler messages:
./example.asm:3: Warning: ignoring changed section attributes for .bob
Compiler returned: 0

So why not create 2 sections instead here?
Quuxplusone commented 4 years ago
(In reply to Ben Dunbobbin from comment #12)
> (In reply to H.J. Lu from comment #11)
> > (In reply to Ben Dunbobbin from comment #10)
> > > Hi H.J. Lu and Fangrui Song,
> > >
> > My comments are specific to .text.  .text is a special section and
> > must be "ax":
> >
> > http://www.sco.com/developers/gabi/latest/ch4.sheader.html#special_sections
>
> Right, sorry shouldn't have chosen a default section, bad example.
>
> New example:
>
> --- asm ----
> .section .bob,"ax",%progbits
> .byte 0
> .section .bob,"awx",%progbits
> .byte 1
>
> --- disassembly ---
> .bob:
>  add    BYTE PTR [rcx],al
>
> --- GAS output ----
> ./example.asm: Assembler messages:
> ./example.asm:3: Warning: ignoring changed section attributes for .bob
> Compiler returned: 0
>
> So why not create 2 sections instead here?

GNU BFD linker places the input sections of the same section name
into the same output section of the same name. I made some changes
a few years back in this area.  I will see what I can do.  But I need
to get this patch of mine:

https://sourceware.org/ml/binutils/2020-02/msg00038.html

into GNU binutils first.
Quuxplusone commented 4 years ago
(In reply to H.J. Lu from comment #13)
> (In reply to Ben Dunbobbin from comment #12)
> > (In reply to H.J. Lu from comment #11)
> > > (In reply to Ben Dunbobbin from comment #10)
> > > > Hi H.J. Lu and Fangrui Song,
>
> GNU BFD linker places the input sections of the same section name
> into the same output section of the same name...

Thanks! So, it's to protect the user... but it's not that useful. If I split
the example up then I don't get a warning:

> --- a.s ----
> .section .bob,"ax",%progbits
> .byte 0

> --- b.s ----
> .section .bob,"awx",%progbits
> .byte 1

So, I think even with the current behavior of GCC it makes sense to generate
multiple sections. I think that the linker is the best place to emit a
compatibility warning when combining sections. However, it would be possible to
emit an assembler warning that multiple output sections have been created with
the same name and the linker may combine these.

Maybe the assembler should not emit multiple sections with the same name under
any circumstances, even for COMDATs or SHF_LINK_ORDER, then it is up to the
user to choose a unique name. That is probably the easiest behavior for users
to understand.

> ... I made some changes
> a few years back in this area.  I will see what I can do.  But I need
> to get this patch of mine:
>
> https://sourceware.org/ml/binutils/2020-02/msg00038.html
>
> into GNU binutils first.

Of course. Thanks for taking the time to reply. I'm not sure there is anything
to do yet. I just wanted to discuss the design of the assembler with an expert.