SNSystems / llvm-project-prepo

Fork of LLVM with modifications to support a program repository
26 stars 0 forks source link

undefined reference to `TLS init function for pstore::details::log_destinations' #139

Closed paulhuggett closed 3 years ago

paulhuggett commented 3 years ago

Following the instructions to build llvm-project-prepo in the docker image. I see the following link error:

[3405/3449] Linking CXX executable tools/pstore/unittests/http/pstore-http-unit-tests
FAILED: tools/pstore/unittests/http/pstore-http-unit-tests 
: && /home/prepo/llvm-project-prepo/build/../llvm/utils/repo/link.py -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections  -O3 -fno-exceptions -fno-rtti  -Wl,-allow-shlib-undefined    -Wl,-O3 -Wl,--gc-sections tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/buffered_reader_mocks.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_buffered_reader.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_headers.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_media_type.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_query_to_kvp.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_request.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_serve_dynamic_content.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_serve_static_content.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_wskey.cpp.o tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_ws_server.cpp.o -o tools/pstore/unittests/http/pstore-http-unit-tests  lib/libLLVMSupport.a -lpthread lib/libgtest_main.a lib/libgtest.a -lpthread lib/libpstore-http.a lib/libLLVMSupport.a -lrt -ldl -lm lib/libLLVMDemangle.a -lpthread lib/libpstore-brokerface.a lib/libpstore-json-lib.a lib/libpstore-romfs.a lib/libpstore-os.a lib/libpstore-adt.a lib/libpstore-support.a -lpthread && :
ERROR:wrap_tool:tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_ws_server.cpp.o.elf: In function `WsServer_Ping_Test::TestBody()':
(.text._ZN18WsServer_Ping_Test8TestBodyEv+0xf72): undefined reference to `TLS init function for pstore::details::log_destinations'
(.text._ZN18WsServer_Ping_Test8TestBodyEv+0xf7a): undefined reference to `TLS init function for pstore::details::log_destinations'
(.text._ZN18WsServer_Ping_Test8TestBodyEv+0xf99): undefined reference to `TLS init function for pstore::details::log_destinations'
(.text._ZN18WsServer_Ping_Test8TestBodyEv+0xfa1): undefined reference to `TLS init function for pstore::details::log_destinations'
(.text._ZN18WsServer_Ping_Test8TestBodyEv+0x105f): undefined reference to `TLS init function for pstore::details::log_destinations'
tools/pstore/unittests/http/CMakeFiles/pstore-http-unit-tests.dir/test_ws_server.cpp.o.elf:(.text._ZN18WsServer_Ping_Test8TestBodyEv+0x1067): more undefined references to `TLS init function for pstore::details::log_destinations' follow
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
paulhuggett commented 3 years ago

Start with something small:

thread_local int a;
int f () {
    return a;
}

Compile and dump the ELF file:

prepo@7eb9e4a94cdb:/test/tls$ clang -c -std=c++14 main.cpp -o main.o && readelf -s main.o

Symbol table '.symtab' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     4: 0000000000000000    14 FUNC    GLOBAL DEFAULT    2 _Z1fv
     5: 0000000000000000    22 FUNC    WEAK   HIDDEN     5 _ZTW1a
     6: 0000000000000000     4 TLS     GLOBAL DEFAULT    7 a

What’s the _ZTW1a function?

prepo@7eb9e4a94cdb:/test/tls$ c++filt _ZTW1a
TLS wrapper function for a

That’s the function missing in the failing case.

paulhuggett commented 3 years ago

Now try it using the repo target.

prepo@7eb9e4a94cdb:/test/tls$ clang -c  -target x86_64-pc-linux-gnu-repo -std=c++14 main.cpp -o main.o
prepo@7eb9e4a94cdb:/test/tls$ repo2obj -o main.elf main.o
prepo@7eb9e4a94cdb:/test/tls$ readelf -s main.elf

Symbol table '.symtab' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000    22 FUNC    WEAK   HIDDEN     5 _ZTW1a
     2: 0000000000000000    14 FUNC    GLOBAL DEFAULT    3 _Z1fv
     3: 0000000000000000     4 TLS     GLOBAL DEFAULT    7 a

Hmm. To my surprise, the _ZTW1a symbol is present here as well.

Something else is at foot.

MaggieYingYi commented 3 years ago

I managed to reproduce the issue in the Ubuntu. The pstore::details::log_destinations is defined in the pstore\lib\os\logging.cpp. I compared the logging.cpp.o.elf and logging.cpp.o. The logging.cpp.o.elf is generated by 2 steps: 1) compiled the logging.cpp targeted on Repo and generated the ELF by using the repo2obj. The logging.cpp.o is directly targeted on ELF. I found the group section is missing in the logging.cpp.o.elf.

The logging.cpp.o.elf file only contains:

Relocation section '.rela.text._ZTWN6pstore7details16log_destinationsE' at offset 0x3804 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  005d00000013 R_X86_64_TLSGD    0000000000000000 _ZN6pstore7details16lo - 4
00000000000d  006600000004 R_X86_64_PLT32    0000000000000000 __tls_get_addr - 4

The correct logging.cpp.o file contains:

COMDAT group section [   63] `.group' [_ZTWN6pstore7details16log_destinationsE] contains 2 sections:
   [Index]    Name
   [   64]   .text._ZTWN6pstore7details16log_destinationsE
   [   65]   .rela.text._ZTWN6pstore7details16log_destinationsE

Relocation section '.rela.text._ZTWN6pstore7details16log_destinationsE' at offset 0x4030 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  005700000013 R_X86_64_TLSGD    0000000000000000 _ZN6pstore7details16lo - 4
00000000000d  007c00000004 R_X86_64_PLT32    0000000000000000 __tls_get_addr - 4

@paulhuggett, is this issue related with repo2obj tool?

MaggieYingYi commented 3 years ago

To make it more clear, the .text._ZTWN6pstore7details16log_destinationsE section doesn't exist in the logging.cpp.o.elf file.

paulhuggett commented 3 years ago

I managed to reproduce the issue in the Ubuntu

FTR, the docker image (currently) uses Ubuntu 18.04 as its base.

is this issue related with repo2obj tool?

I don't think there's yet enough information here to make that determination.

MaggieYingYi commented 3 years ago

To make it more clear, the .text._ZTWN6pstore7details16log_destinationsE section doesn't exist in the logging.cpp.o.elf file.

When I created the dump file using readelf, all of the names have been truncated. The llvm-readelf doesn't truncate the name. I checked it using llvm-readelf and the .text._ZTWN6pstore7details16log_destinationsE section did exist in the logging.cpp.o.elf file.

MaggieYingYi commented 3 years ago

We could reproduce the linking error when spliting the variable a and function f into two files.

test.cpp:

thread_local int a;

main.cpp

extern thread_local int a;
int main() {
  return a;
}

Compile and link to generate the ELF file:

$ clang --target=x86_64-pc-linux-gnu-repo -std=c++14 -c test.cpp -o test.repo.o
$ clang --target=x86_64-pc-linux-gnu-repo -std=c++14 -c main.cpp -o main.repo.o
$ repo2obj test.repo.o -o test.o.elf
$ repo2obj main.repo.o -o main.o.elf
$ ld test.o.elf main.o.elf -o main

ld: main.o.elf: in function `TLS wrapper function for a':
(.text._ZTW1a[_ZTW1a]+0x5): undefined reference to `TLS init function for a'
ld: (.text._ZTW1a[_ZTW1a]+0x18): undefined reference to `TLS init function for a'

Dump the symbol table in `main.o.elf:

$ readelf -s main.o.elf

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 TLS     GLOBAL DEFAULT  UND a
     2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZTH1a
     3: 0000000000000000    52 FUNC    WEAK   HIDDEN     4 _ZTW1a
     4: 0000000000000000    28 FUNC    GLOBAL DEFAULT    6 main

$ c++filt _ZTH1a
TLS init function for a

Therefore, the symbol _ZTH1a is missing.

MaggieYingYi commented 3 years ago

Compile and dump the ELF file:

$ clang --target=x86_64-pc-linux-gnu-elf -std=c++14 -c main.cpp -o main.o
$ readelf -s main.o

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ZTH1a
     5: 0000000000000000    52 FUNC    WEAK   HIDDEN     5 _ZTW1a
     6: 0000000000000000     0 TLS     GLOBAL DEFAULT  UND a
     7: 0000000000000000    28 FUNC    GLOBAL DEFAULT    2 main

We could see the _ZTH1a Bind is GLOBAL in the main.o.elf, however, in the main.o, it is a WEAK bind.

Currently, repo2obj creates the undefined symbol with GLOBAL bind. I changed the repo2obj to change the _ZTH1a bind from GLOBAL to WEAK, the linking error is fixed.

paulhuggett commented 3 years ago

Thank you. I can reproduce the problem with this small example.

I thought that it would be useful to document the result from the perspective of the database contents since the output from repo2obj is derived from this (and is one step removed from our single source of truth).

(In the pstore-dump output below I’ve stripped away information that’s not relevant.)

$ pstore-dump --compilation=$(repo-ticket-dump main.o),$(repo-ticket-dump test.o) clang.db
---
- compilations : 
      - digest      : 832c1ca123da314429a6be08e30719a2
        compilation : 
            members : 
                - digest     : 7299f7d7548b68f0860d571f35f66a9c
                  name       : main
                  linkage    : external
                  visibility : default
                - digest     : fc32513c880c2310fc7a5c3659dcb979
                  name       : _ZTW1a
                  linkage    : link_once_odr
                  visibility : hidden
      - digest      : a8c11784f87225adb0eb05e49ca96f80
        compilation : 
            members : 
                - digest     : 6d3a5c15e159a5fc95f6793b2fa7f79d
                  name       : _ZTW1a
                  linkage    : weak_odr
                  visibility : hidden
                - digest     : cb075f27af9a10e4225673bb7ddead7b
                  name       : a
                  linkage    : external
                  visibility : default
...

We’re getting an error from main.o.elf, so let’s see what the external fixups there are:

$ pstore-dump --fragment=$(repo-fragments --comma --digest-only main.o) clang.db
---
- fragments : 
      - digest   : 7299f7d7548b68f0860d571f35f66a9c
        fragment : 
            - type     : text
              contents : 
                  xfixups : 
                      - name   : _ZTW1a
      - digest   : fc32513c880c2310fc7a5c3659dcb979
        fragment : 
            - type     : text
              contents : 
                  xfixups : 
                      - name   : _ZTH1a
                      - name   : _ZTH1a
                      - name   : a
...

We can see that the compiler did not generate a definition of _ZTH1a.

(Also intriguing is that we have two definitions of _ZTW1a — one is link_once_odr and the other weak_odr — with different fragment digests implying that the compiler’s definition is different in the two instances.)

paulhuggett commented 3 years ago

Comparing the names defined in the database to the names defined by the compiler’s ELF output.

MaggieYingYi commented 3 years ago

Thank you for the detailed comparison between ELF and Repo.

I went through the lld and found the report linking error is related with the binding of _ZTH1a. If the _ZTH1a binding is WEAK, it would not report the linking error. On the other side, the lld reports an undefined linking error if the _ZTH1a binding is GLOBAL.

This is implemented in the function maybeReportUndefined, which is defined in lld\ELF\Relocations.cpp file.

// Report an undefined symbol if necessary.
// Returns true if the undefined symbol will produce an error message.
static bool maybeReportUndefined(Symbol &sym, InputSectionBase &sec,
                                 uint64_t offset) {
  if (!sym.isUndefined() || sym.isWeak())
    return false;
  :
  :
}
MaggieYingYi commented 3 years ago

Next question is how _ZTH1a is set to WEAK in the main.o [ELF].

Firstly, compile the main.cpp to generate the IR code.

$ clang++ --target=x86_64-pc-linux-gnu-elf -std=c++14 -S -emit-llvm main.cpp -o main.ll

The _ZTH1a is declared in the main.ll.

declare extern_weak void @_ZTH1a()

We could see _ZTH1a has an extern_weak linkage type.

Generate the main.o using llc:

$ llc -filetype=obj main.ll -o main.o

When I debug the llc, I found that _ZTH1a symbol is set to WEAK binding in the AsmPrinter::doFinalization(Module &M) function of llvm-project-prepo\llvm\lib\CodeGen\AsmPrinter\AsmPrinter.cpp file.

bool AsmPrinter::doFinalization(Module &M) {
    :
    for (const auto &GO : M.global_objects()) {
      if (!GO.hasExternalWeakLinkage())
        continue;
      OutStreamer->emitSymbolAttribute(getSymbol(&GO), MCSA_WeakReference);
    }
    :
}

From the above code, we could see that if GO has an extern_weak linkage type, the GO's corresponding symbol is set to WEAK binding.

MaggieYingYi commented 3 years ago

When targeted on the repo, the declared global objects are not stored in the compilation in the database. Should we add a new member (e.g. extern_weak_symbol) in the compilation? The new member stores all declared symbol names if these symbols have the extern_weak linkage type.

paulhuggett commented 3 years ago

When targeted on the repo, the declared global objects are not stored in the compilation in the database.

You’re right, they’re not. The compilation record in the database contains the names defined in that compilation, not the declarations.

It’s important that I should not that this lack of support for “weak” was a deliberate choice aimed at reducing complexity in the tools: particularly the linker. The first half of this aim was undone when a previous commit exposed the complete set of LLVM linkage types to the repo including weak definitions (requiring a significant reworking of the rld front-end). Adding support for weak references as well means that we have work to do across the board, including repo2obj and rld.

Should we add a new member (e.g. extern_weak_symbol) in the compilation?

We could, and this is approximately how it’s done in ELF. ELF uses undefined symbols to describe all of its references and the symbol’s binding to establish whether the reference is weak.

Our model is a little different, though. The compilation record contains a collection of definitions; this collection does not include the referenced symbols. This is instead done by having the external-fixups simply refer to a name. It’s then up to the linker to match this name to the corresponding definition following the established rules.

Given that a name is stored as an Indirect String (basically a pointer to a pointer) and that pointers (or rather addresses) in pstore are eight byte aligned, could we steal one of those zero bits in an external fixup to indicate whether the reference is weak or strong?

This implementation would have a relatively small impact on rld: reporting undefined symbols is a little more complex: undefined and weakly referenced is fine whereas undefined and strongly referenced is not. repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.

Does this make sense to you?

MaggieYingYi commented 3 years ago

Does this make sense to you?

Good point. Yes, it makes sense to me.

Several implementation level questions want to be discussed here to make sure my understanding correctly.

Given that a name is stored as an Indirect String (basically a pointer to a pointer) and that pointers (or rather addresses) in pstore are eight byte aligned, could we steal one of those zero bits in an external fixup to indicate whether the reference is weak or strong?

Currently, the struct external_fixup is:

    struct external_fixup {
        :
        typed_address<indirect_string> name;
        relocation_type type;
        // TODO: much padding here.
        std::uint8_t padding1 = 0;
        std::uint16_t padding2 = 0;
        std::uint32_t padding3 = 0;
        std::uint64_t offset;
        std::uint64_t addend;
    };

Did you mean that we can use these paddings (For example, padding1) to indicate the refernce is weak?

    struct external_fixup {
        :
        typed_address<indirect_string> name;
        relocation_type type;
        // TODO: much padding here.
        bool IsWeak = false;              <---- By default, it is strong binding.
        std::uint16_t padding1 = 0;
        std::uint32_t padding2 = 0;
        std::uint64_t offset;
        std::uint64_t addend;
    };

repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.

It should be fine in this case since two symbols have own individaul XFixup, which hold either strong or weak reference. Am I misunderstanding anything?

In additional, we don't store declared functions in the LLVM IR code. To keep the extern_weak declaration GOs in the repo IR code, could we add a module level metadata to store the extern_weak symbol names, e.g., repo.extern.weak?

!repo.extern.weak = !{!"_ZTH1a"}

This repo.extern.weak could be used in the backend ObjectWriter to store the weak reference in XFixups.

paulhuggett commented 3 years ago

Currently, the struct external_fixup is: [snip] Did you mean that we can use these paddings [snip] to indicate the refernce is weak?

Not quite. The current definition of this struct wastes a lot of space. In the short term we can just add an 8-bit bool field in here and remove 8 bits of padding, but a “proper” implementation should remove all of the unused space including the unused bits in the typed_address<> field. The latter is currently occupying 64 bits but in fact only requires 44 of them.

repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.

It should be fine in this case since two symbols have own individaul XFixup, which hold either strong or weak reference. Am I misunderstanding anything?

I don’t think so. I’m just noting that repo2obj support for this isn’t trivial.

In additional, we don't store declared functions in the LLVM IR code. To keep the extern_weak declaration GOs in the repo IR code, could we add a module level metadata to store the extern_weak symbol names, e.g., repo.extern.weak?

Do we we need to do this? (The ELF output code manages to get the binding right; that suggests to me that the back-end already has access to the necessary information.)

paulhuggett commented 3 years ago

The first part of a fix to this problem — adding support for weak references to pstore — is posted as pstore PR#85.

paulhuggett commented 3 years ago

The first part of a fix to this problem — adding support for weak references to pstore — is posted as pstore PR#85.

Submitted as commit 3169675.

MaggieYingYi commented 3 years ago

Do we we need to do this? (The ELF output code manages to get the binding right; that suggests to me that the back-end already has access to the necessary information.)

You are right. We don't need this.

MaggieYingYi commented 3 years ago

The second part of a fix to this problem — adding support for weak references to MCSymbolRepo and repo2obj — is posted as llvm-project-prepo PR#140.

MaggieYingYi commented 3 years ago

The second part of a fix to this problem — adding support for weak references to MCSymbolRepo and repo2obj — is posted as llvm-project-prepo PR#140.

Submitted as commit 0d1ee388.

paulhuggett commented 3 years ago

Submitted as commit 0d1ee388cfa7e7748a855967f4035688e7347c47.

The build (and ”check-all”) of llvm-project-prepo in the docker container now completes successfully. Note that work to implement weak references in the rld is still under way.