Closed paulhuggett closed 3 years ago
Start with something small:
thread_local int a;
int f () {
return a;
}
Compile and dump the ELF file:
prepo@7eb9e4a94cdb:/test/tls$ clang -c -std=c++14 main.cpp -o main.o && readelf -s main.o
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5
4: 0000000000000000 14 FUNC GLOBAL DEFAULT 2 _Z1fv
5: 0000000000000000 22 FUNC WEAK HIDDEN 5 _ZTW1a
6: 0000000000000000 4 TLS GLOBAL DEFAULT 7 a
What’s the _ZTW1a function?
prepo@7eb9e4a94cdb:/test/tls$ c++filt _ZTW1a
TLS wrapper function for a
That’s the function missing in the failing case.
Now try it using the repo target.
prepo@7eb9e4a94cdb:/test/tls$ clang -c -target x86_64-pc-linux-gnu-repo -std=c++14 main.cpp -o main.o
prepo@7eb9e4a94cdb:/test/tls$ repo2obj -o main.elf main.o
prepo@7eb9e4a94cdb:/test/tls$ readelf -s main.elf
Symbol table '.symtab' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 22 FUNC WEAK HIDDEN 5 _ZTW1a
2: 0000000000000000 14 FUNC GLOBAL DEFAULT 3 _Z1fv
3: 0000000000000000 4 TLS GLOBAL DEFAULT 7 a
Hmm. To my surprise, the _ZTW1a symbol is present here as well.
Something else is at foot.
I managed to reproduce the issue in the Ubuntu. The pstore::details::log_destinations
is defined in the pstore\lib\os\logging.cpp
. I compared the logging.cpp.o.elf and logging.cpp.o. The logging.cpp.o.elf is generated by 2 steps: 1) compiled the logging.cpp targeted on Repo and generated the ELF by using the repo2obj. The logging.cpp.o is directly targeted on ELF. I found the group section is missing in the logging.cpp.o.elf.
The logging.cpp.o.elf file only contains:
Relocation section '.rela.text._ZTWN6pstore7details16log_destinationsE' at offset 0x3804 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000005 005d00000013 R_X86_64_TLSGD 0000000000000000 _ZN6pstore7details16lo - 4
00000000000d 006600000004 R_X86_64_PLT32 0000000000000000 __tls_get_addr - 4
The correct logging.cpp.o file contains:
COMDAT group section [ 63] `.group' [_ZTWN6pstore7details16log_destinationsE] contains 2 sections:
[Index] Name
[ 64] .text._ZTWN6pstore7details16log_destinationsE
[ 65] .rela.text._ZTWN6pstore7details16log_destinationsE
Relocation section '.rela.text._ZTWN6pstore7details16log_destinationsE' at offset 0x4030 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000005 005700000013 R_X86_64_TLSGD 0000000000000000 _ZN6pstore7details16lo - 4
00000000000d 007c00000004 R_X86_64_PLT32 0000000000000000 __tls_get_addr - 4
@paulhuggett, is this issue related with repo2obj tool?
To make it more clear, the .text._ZTWN6pstore7details16log_destinationsE
section doesn't exist in the logging.cpp.o.elf file.
I managed to reproduce the issue in the Ubuntu
FTR, the docker image (currently) uses Ubuntu 18.04 as its base.
is this issue related with repo2obj tool?
I don't think there's yet enough information here to make that determination.
To make it more clear, the
.text._ZTWN6pstore7details16log_destinationsE
section doesn't exist in the logging.cpp.o.elf file.
When I created the dump file using readelf, all of the names have been truncated. The llvm-readelf doesn't truncate the name. I checked it using llvm-readelf and the .text._ZTWN6pstore7details16log_destinationsE
section did exist in the logging.cpp.o.elf file.
We could reproduce the linking error when spliting the variable a
and function f
into two files.
test.cpp:
thread_local int a;
main.cpp
extern thread_local int a;
int main() {
return a;
}
Compile and link to generate the ELF file:
$ clang --target=x86_64-pc-linux-gnu-repo -std=c++14 -c test.cpp -o test.repo.o
$ clang --target=x86_64-pc-linux-gnu-repo -std=c++14 -c main.cpp -o main.repo.o
$ repo2obj test.repo.o -o test.o.elf
$ repo2obj main.repo.o -o main.o.elf
$ ld test.o.elf main.o.elf -o main
ld: main.o.elf: in function `TLS wrapper function for a':
(.text._ZTW1a[_ZTW1a]+0x5): undefined reference to `TLS init function for a'
ld: (.text._ZTW1a[_ZTW1a]+0x18): undefined reference to `TLS init function for a'
Dump the symbol table in `main.o.elf
:
$ readelf -s main.o.elf
Symbol table '.symtab' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 TLS GLOBAL DEFAULT UND a
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZTH1a
3: 0000000000000000 52 FUNC WEAK HIDDEN 4 _ZTW1a
4: 0000000000000000 28 FUNC GLOBAL DEFAULT 6 main
$ c++filt _ZTH1a
TLS init function for a
Therefore, the symbol _ZTH1a
is missing.
Compile and dump the ELF file:
$ clang --target=x86_64-pc-linux-gnu-elf -std=c++14 -c main.cpp -o main.o
$ readelf -s main.o
Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ZTH1a
5: 0000000000000000 52 FUNC WEAK HIDDEN 5 _ZTW1a
6: 0000000000000000 0 TLS GLOBAL DEFAULT UND a
7: 0000000000000000 28 FUNC GLOBAL DEFAULT 2 main
We could see the _ZTH1a
Bind is GLOBAL
in the main.o.elf, however, in the main.o, it is a WEAK
bind.
Currently, repo2obj creates the undefined symbol with GLOBAL bind. I changed the repo2obj to change the _ZTH1a bind from GLOBAL to WEAK, the linking error is fixed.
Thank you. I can reproduce the problem with this small example.
I thought that it would be useful to document the result from the perspective of the database contents since the output from repo2obj is derived from this (and is one step removed from our single source of truth).
(In the pstore-dump output below I’ve stripped away information that’s not relevant.)
$ pstore-dump --compilation=$(repo-ticket-dump main.o),$(repo-ticket-dump test.o) clang.db
---
- compilations :
- digest : 832c1ca123da314429a6be08e30719a2
compilation :
members :
- digest : 7299f7d7548b68f0860d571f35f66a9c
name : main
linkage : external
visibility : default
- digest : fc32513c880c2310fc7a5c3659dcb979
name : _ZTW1a
linkage : link_once_odr
visibility : hidden
- digest : a8c11784f87225adb0eb05e49ca96f80
compilation :
members :
- digest : 6d3a5c15e159a5fc95f6793b2fa7f79d
name : _ZTW1a
linkage : weak_odr
visibility : hidden
- digest : cb075f27af9a10e4225673bb7ddead7b
name : a
linkage : external
visibility : default
...
We’re getting an error from main.o.elf, so let’s see what the external fixups there are:
$ pstore-dump --fragment=$(repo-fragments --comma --digest-only main.o) clang.db
---
- fragments :
- digest : 7299f7d7548b68f0860d571f35f66a9c
fragment :
- type : text
contents :
xfixups :
- name : _ZTW1a
- digest : fc32513c880c2310fc7a5c3659dcb979
fragment :
- type : text
contents :
xfixups :
- name : _ZTH1a
- name : _ZTH1a
- name : a
...
We can see that the compiler did not generate a definition of _ZTH1a.
(Also intriguing is that we have two definitions of _ZTW1a — one is link_once_odr and the other weak_odr — with different fragment digests implying that the compiler’s definition is different in the two instances.)
Comparing the names defined in the database to the names defined by the compiler’s ELF output.
main.cpp
From this comment we can see that the main.o ticket (compilation 832c1ca123da314429a6be08e30719a2) defines main
and _ZTW1a
. From this comment main.o.elf contains definitions of the same two names (after stripping out the FILE, SECTION, and undef symbols).
test.cpp
The compilation associated with test.o (a8c11784f87225adb0eb05e49ca96f80) defines _ZTW1a
and a
.
$ clang++ -c -o test.obj -std=c++14 test.cpp
$ readelf -s test.obj
Symbol table '.symtab' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 4
3: 0000000000000000 22 FUNC WEAK HIDDEN 4 _ZTW1a
4: 0000000000000000 4 TLS GLOBAL DEFAULT 6 a
$
Here we similarly have definitions of _ZTW1a
and a
.
Thank you for the detailed comparison between ELF and Repo.
I went through the lld
and found the report linking error is related with the binding of _ZTH1a
. If the _ZTH1a
binding is WEAK
, it would not report the linking error. On the other side, the lld
reports an undefined linking error if the _ZTH1a
binding is GLOBAL
.
This is implemented in the function maybeReportUndefined
, which is defined in lld\ELF\Relocations.cpp
file.
// Report an undefined symbol if necessary.
// Returns true if the undefined symbol will produce an error message.
static bool maybeReportUndefined(Symbol &sym, InputSectionBase &sec,
uint64_t offset) {
if (!sym.isUndefined() || sym.isWeak())
return false;
:
:
}
Next question is how _ZTH1a
is set to WEAK
in the main.o [ELF].
Firstly, compile the main.cpp to generate the IR code.
$ clang++ --target=x86_64-pc-linux-gnu-elf -std=c++14 -S -emit-llvm main.cpp -o main.ll
The _ZTH1a
is declared in the main.ll.
declare extern_weak void @_ZTH1a()
We could see _ZTH1a
has an extern_weak
linkage type.
Generate the main.o using llc
:
$ llc -filetype=obj main.ll -o main.o
When I debug the llc
, I found that _ZTH1a
symbol is set to WEAK
binding in the AsmPrinter::doFinalization(Module &M)
function of llvm-project-prepo\llvm\lib\CodeGen\AsmPrinter\AsmPrinter.cpp
file.
bool AsmPrinter::doFinalization(Module &M) {
:
for (const auto &GO : M.global_objects()) {
if (!GO.hasExternalWeakLinkage())
continue;
OutStreamer->emitSymbolAttribute(getSymbol(&GO), MCSA_WeakReference);
}
:
}
From the above code, we could see that if GO has an extern_weak
linkage type, the GO's corresponding symbol is set to WEAK
binding.
When targeted on the repo, the declared global objects are not stored in the compilation in the database. Should we add a new member (e.g. extern_weak_symbol
) in the compilation? The new member stores all declared symbol names if these symbols have the extern_weak
linkage type.
When targeted on the repo, the declared global objects are not stored in the compilation in the database.
You’re right, they’re not. The compilation record in the database contains the names defined in that compilation, not the declarations.
It’s important that I should not that this lack of support for “weak” was a deliberate choice aimed at reducing complexity in the tools: particularly the linker. The first half of this aim was undone when a previous commit exposed the complete set of LLVM linkage types to the repo including weak definitions (requiring a significant reworking of the rld front-end). Adding support for weak references as well means that we have work to do across the board, including repo2obj and rld.
Should we add a new member (e.g. extern_weak_symbol) in the compilation?
We could, and this is approximately how it’s done in ELF. ELF uses undefined symbols to describe all of its references and the symbol’s binding to establish whether the reference is weak.
Our model is a little different, though. The compilation record contains a collection of definitions; this collection does not include the referenced symbols. This is instead done by having the external-fixups simply refer to a name. It’s then up to the linker to match this name to the corresponding definition following the established rules.
Given that a name is stored as an Indirect String (basically a pointer to a pointer) and that pointers (or rather addresses) in pstore are eight byte aligned, could we steal one of those zero bits in an external fixup to indicate whether the reference is weak or strong?
This implementation would have a relatively small impact on rld: reporting undefined symbols is a little more complex: undefined and weakly referenced is fine whereas undefined and strongly referenced is not. repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.
Does this make sense to you?
Does this make sense to you?
Good point. Yes, it makes sense to me.
Several implementation level questions want to be discussed here to make sure my understanding correctly.
Given that a name is stored as an Indirect String (basically a pointer to a pointer) and that pointers (or rather addresses) in pstore are eight byte aligned, could we steal one of those zero bits in an external fixup to indicate whether the reference is weak or strong?
Currently, the struct external_fixup is:
struct external_fixup {
:
typed_address<indirect_string> name;
relocation_type type;
// TODO: much padding here.
std::uint8_t padding1 = 0;
std::uint16_t padding2 = 0;
std::uint32_t padding3 = 0;
std::uint64_t offset;
std::uint64_t addend;
};
Did you mean that we can use these paddings (For example, padding1
) to indicate the refernce is weak?
struct external_fixup {
:
typed_address<indirect_string> name;
relocation_type type;
// TODO: much padding here.
bool IsWeak = false; <---- By default, it is strong binding.
std::uint16_t padding1 = 0;
std::uint32_t padding2 = 0;
std::uint64_t offset;
std::uint64_t addend;
};
repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.
It should be fine in this case since two symbols have own individaul XFixup, which hold either strong or weak reference. Am I misunderstanding anything?
In additional, we don't store declared functions in the LLVM IR code. To keep the extern_weak
declaration GOs in the repo IR code, could we add a module level metadata to store the extern_weak symbol names, e.g., repo.extern.weak?
!repo.extern.weak = !{!"_ZTH1a"}
This repo.extern.weak
could be used in the backend ObjectWriter
to store the weak reference in XFixups.
Currently, the struct external_fixup is: [snip] Did you mean that we can use these paddings [snip] to indicate the refernce is weak?
Not quite. The current definition of this struct wastes a lot of space. In the short term we can just add an 8-bit bool field in here and remove 8 bits of padding, but a “proper” implementation should remove all of the unused space including the unused bits in the typed_address<> field. The latter is currently occupying 64 bits but in fact only requires 44 of them.
repo2obj may be a little more complex because a single complication could contain both strong and weak references requiring two symbols.
It should be fine in this case since two symbols have own individaul XFixup, which hold either strong or weak reference. Am I misunderstanding anything?
I don’t think so. I’m just noting that repo2obj support for this isn’t trivial.
In additional, we don't store declared functions in the LLVM IR code. To keep the
extern_weak
declaration GOs in the repo IR code, could we add a module level metadata to store the extern_weak symbol names, e.g., repo.extern.weak?
Do we we need to do this? (The ELF output code manages to get the binding right; that suggests to me that the back-end already has access to the necessary information.)
The first part of a fix to this problem — adding support for weak references to pstore — is posted as pstore PR#85.
The first part of a fix to this problem — adding support for weak references to pstore — is posted as pstore PR#85.
Submitted as commit 3169675.
Do we we need to do this? (The ELF output code manages to get the binding right; that suggests to me that the back-end already has access to the necessary information.)
You are right. We don't need this.
The second part of a fix to this problem — adding support for weak references to MCSymbolRepo and repo2obj — is posted as llvm-project-prepo PR#140.
Submitted as commit 0d1ee388cfa7e7748a855967f4035688e7347c47.
The build (and ”check-all”) of llvm-project-prepo in the docker container now completes successfully. Note that work to implement weak references in the rld is still under way.
Following the instructions to build llvm-project-prepo in the docker image. I see the following link error: