[PAuth][LLDB] Make signed pointers usable in expressions

Make possible to transparently use signed pointers fetched from the target process in LLDB expressions. Presently, LLDB doesn't take into account pointers being signed when dereferencing them (resulting in SEGV or some other sort of "invalid pointer" error).

Let's consider the following code:

indirect-call-for-lldb.c:

int f(int n) {
  return n + 42;
}

int g(int n) {
  return 1;
}

volatile void *p;

int main(int argc, const char *argv[]) {
  int (*ptr)(int) = argc > 1 ? f : g;
  p = ptr;
  return ptr(1);
}

Compile it using our toolchain (commit 62ce88f0f2a77665529947f20d276d640c37f76f) with the following command:

/path/to/bin/clang \
    -fuse-ld=lld -O2 -g3 \
    -target aarch64-linux-gnu-musl \
    -mcpu=cortex-a72+pauth \
    -mbranch-protection=pauthabi \
    --sysroot=./local-with-musl \
    -Wl,--dynamic-linker=/share/pauth-atrosinenko/local-with-musl/lib/ld-musl-aarch64.so.1 \
    indirect-call-for-lldb.c \
    -o indirect-call-for-lldb

Inside QEMU, segmentation fault is observed when trying to perform indirect call.

_Note: I use -cpu max,pauth=on,pauth-impdef=on QEMU options to get reasonable simulation performance, so PACs usually don't look very random (0x003c in the below example)._

[ast@localhost pauth-atrosinenko]$ ./local-with-musl/bin/lldb -- ./indirect-call-for-lldb
(lldb) target create "./indirect-call-for-lldb"
Current executable set to '/share/pauth-atrosinenko/indirect-call-for-lldb' (aarch64).
(lldb) break set -y indirect-call-for-lldb.c:13
Breakpoint 1: where = indirect-call-for-lldb`main + 36 at indirect-call-for-lldb.c:13:5, address = 0x0000000000010524
(lldb) run
Process 1103 launched: '/share/pauth-atrosinenko/indirect-call-for-lldb' (aarch64)
Process 1103 stopped
* thread #1, name = 'indirect-call-f', stop reason = breakpoint 1.1
    frame #0: 0x0000aaaaaaab0524 indirect-call-for-lldb`main(argc=1, argv=<unavailable>) at indirect-call-for-lldb.c:13:5
   10  
   11   int main(int argc, const char *argv[]) {
   12     int (*ptr)(int) = argc > 1 ? f : g;
-> 13     p = ptr;
   14     return ptr(1);
   15   }
(lldb) p ptr
(int (*)(int)) 0x003caaaaaaab04f0 (actual=0x0000aaaaaaab04f0 indirect-call-for-lldb`g at indirect-call-for-lldb.c:6:3)
(lldb) call ptr(1)
error: Execution was interrupted, reason: signal SIGSEGV: address not mapped to object (fault address: 0x3caaaaaaab04f0).
The process has been returned to the state before expression evaluation.
(lldb) c
Process 1103 resuming
Process 1103 exited with status = 1 (0x00000001) 
(lldb) q

I have no clear solution for this right now, below are some observations.

As far as I understand the DWARF 5 specification, section 1.3.7, it should be preferable to describe signed pointer values explicitly:

1.3.7 Explicit Rather Than Implicit Description

DWARF describes the source to object translation explicitly rather than using
common practice or convention as an implicit understanding between producer
and consumer. For example, where other debugging formats assume that a
debugger knows how to virtually unwind the stack, moving from one stack
frame to the next using implicit knowledge about the architecture or operating
system, DWARF makes this explicit in the Call Frame Information description.

According to aadwarf64 document, there are several AArch64 platform-specific extensions to support PAuth. While DW_CFA_AARCH64_negate_ra_state call frame instruction is already used by LLVM, I did not find mentioning DW_SUB_OP_AARCH64_sign in current LLVM (and anyway, it looks like I have the reverse problem: explain that the value fetched from the target process is already signed and should be XPAC-ed before use in some contexts).

To the extent I currently understand DWARF, it looks working to me to introduce something like DW_SUB_OP_AARCH64_xpac operation and make LLVM generate debug information like "ptr := (xpaci (reg_value X1))". On the other hand, in the original example, LLDB prints ptr as 0x003caaaaaaab04f0 (actual=0x0000aaaaaaab04f0 ...). This seems definitely useful to see the value both as a signed pointer and as a plain VMA. Thus, maybe this should be encoded not as an expression ("how to obtain the value") but as a type ("what we have obtained").

My initial thought was to look at what is generated for TBI on AArch64, as it is another case of placing something loosely related into the higher bits of address value. This can be achieved by compiling an example program with HWAsan enabled. Unfortunately, I did not find anything interesting in the DWARF description generated for code that uses TBI. And that looks reasonable because TBI by definition makes all those 256 pointer values (differing by 8-bit tag) valid w.r.t. address translation.

I tried patching the existing debug information by adding DW_AT_bit_size (48) attribute to the base DW_TAG_pointer_type abbreviation but not yet managed to affect debugger in any observable way.

Just in case, it is possible to adjust the debug information manually after it was produced by code generator. The *.s file produced by clang contains lots of .byte (with meaningful DWARF names in comments). It may be easier to change the existing abbreviation (or copy of it) as adding/removing anything in .debug_info section instead may require adjusting byte offsets.

On the other hand, maybe trying to convey the fact that the pointer is signed via DWARF sections is overengineering. If so, should lldb-server on AArch64 just unconditionally clear top-most 16 bits before dereferencing the pointer (by data load/store or control flow) in jitted code or should such behavior be enabled somehow for the particular target process as a whole.

Tagging @kbeyls and @smithp35

Also tagging @DavidSpickett

If so, should lldb-server on AArch64 just unconditionally clear top-most 16 bits before dereferencing the pointer (by data load/store or control flow)

Well, this might have some interesting consequences. For example, if for some reason the signature will be wrong, then we will be unable to debug such issue. Debugger will just strip the signature and everything will work, while in the normal runtime we'll end dereferencing invalid pointer.

I think we should be explicit that things are signed and IMO it's a property of the type (after all, void* and __ptrauth void* are different types from the compiler perspective and conversion between them is not a noop). Also we should be able to describe the particular signing scheme used for the particular pointer (discriminator used, key, whether address discrimination is used, etc.)

Also, the things should be organized in some fine-grained basis, e.g. we can easily have a mixture of signed / unsigned pointers in some stub codes.

The orthogonal question is expression JIT as here we need to generate properly signed pointers as far as I can see as they could escape.

Well, this might have some interesting consequences. For example, if for some reason the signature will be wrong, then we will be unable to debug such issue. Debugger will just strip the signature and everything will work, while in the normal runtime we'll end dereferencing invalid pointer.

This is what happens already (in lldb mostly, though a little bit in lldb-server), it was deemed a decent compromise until we had other signals to go off of to tell whether it was signed.

See https://www.linaro.org/blog/lldb-15-and-the-mystery-of-the-non-address-bits/ "Corrupted Pointer or Non-Address Bits?"

If we know that we're in this ABI and that there is this annotation, we can use that to be more strict.

There is also the issue of whether you want to be able to give signed pointers to commands like memory read. It may be useful to pass it a pointer that just faulted, to see what it would have read if it was valid.

Another command example, do you expect memory region to remove non-address bits for you, or require the user to do so if it's a signed pointer? I'd say the user shouldn't have to wait for the program to authenticate it, just so they can find out what memory region it points to. So it's a command by command thing I think, and needs some usage to decide what's best.

My vague thought here is that you would make the AArch64 ABI plugin aware of whether the PAuth ABI is being used, and it would change its Fix..Address methods accordingly. Special cases like the exception printer will need some way to always clear the signature bits for the actual:... bit. If there is a program file level attribute for the PAuth ABI this could be passed to the plugin to achieve this.

So:

In the absence of the PAuth ABI we strip all pointers all the time (what we do now).
If we know we're using the PAuth ABI, pay attention to the annotations and act as appropriate for the command.

A couple of thoughts how signing schema might be encoded in Dwarf.

The most straight-forward and simple way that comes into mind is adding a new attribute, say, DW_AT_signing_schema, which would store a combination of the key, discriminator and address diversity flag inside an integer.

If a DW_TAG_* entity is meant to be signed, it should have DW_AT_signing_schema set correspondingly. For example, for a signed vtable pointer of a polymorphic class, the DW_TAG_member with DW_AT_name = ("_vptr$classname") would have DW_AT_signing_schema set.

The place where we obtain the context for a user expression is lldb_private::plugin::dwarf::SymbolFileDWARF::ParseDeclsForContext (see the full stack under spoiler below). We can, for example, change DWARFASTParser::GetTypeForDIE so it encounters the signing scheme when obtaining type. As a result, the JIT compiler would have a type with explicit ptrauth attributes.

``` #0 lldb_private::plugin::dwarf::SymbolFileDWARF::ParseDeclsForContext (this=0xaaaaaac90090, decl_ctx=...) at /path/to/llvm-project/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:1426 #1 0x0000fffff40d03e4 in lldb_private::TypeSystemClang::DeclContextFindDeclByName (this=0xaaaaaac8ff30, opaque_decl_ctx=, name=..., ignore_using_decls=) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/stl_iterator.h:1157 #2 0x0000fffff3cadd5c in lldb_private::CompilerDeclContext::FindDeclByName (this=this@entry=0xffffffffa600, name=..., name@entry=..., ignore_using_decls=) at /path/to/llvm-project/lldb/source/Symbol/CompilerDeclContext.cpp:20 #3 0x0000fffff48c9ef0 in lldb_private::ClangExpressionDeclMap::LookupLocalVariable (this=0xffffd00097f0, context=..., name=..., sym_ctx=..., namespace_decl=...) at /path/to/llvm-project/lldb/include/lldb/Symbol/CompilerDeclContext.h:56 #4 0x0000fffff48cbec0 in lldb_private::ClangExpressionDeclMap::FindExternalVisibleDecls (this=0xffffd00097f0, context=..., module_sp=..., namespace_decl=...) at /path/to/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp:1423 #5 0x0000fffff48cc9e8 in lldb_private::ClangExpressionDeclMap::FindExternalVisibleDecls (this=0xffffd00097f0, context=...) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/shared_ptr_base.h:1462 #6 0x0000fffff48a7b08 in lldb_private::ClangASTSource::FindExternalVisibleDeclsByName (this=0xffffd00097f0, decl_ctx=0xaaaaab55ad70, clang_decl_name=...) at /path/to/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangASTSource.cpp:180 #7 0x0000fffff5f9c248 in clang::DeclContext::lookup (this=this@entry=0xaaaaab55ad70, Name=...) at /path/to/llvm-project/clang/lib/AST/DeclBase.cpp:1835 #8 0x0000fffff5a23bac in LookupDirect (S=..., R=..., DC=DC@entry=0xaaaaab55ad70) at /path/to/llvm-project/clang/include/clang/AST/DeclarationName.h:791 #9 0x0000fffff5a243a8 in CppNamespaceLookup (S=..., R=..., NS=NS@entry=0xaaaaab55ad70, UDirs=..., Context=) at /path/to/llvm-project/clang/lib/Sema/SemaLookup.cpp:1221 #10 0x0000fffff5a1cb88 in clang::Sema::CppLookupName (this=0xaaaaab55cd60, R=..., S=0xaaaaab14f350) at /path/to/llvm-project/clang/lib/Sema/SemaLookup.cpp:1508 #11 0x0000fffff5a1d430 in clang::Sema::LookupName (this=0xaaaaab55cd60, R=..., S=0xaaaaab06e5c0, AllowBuiltinCreation=false, ForceNoCPlusPlus=) at /path/to/llvm-project/clang/lib/Sema/SemaLookup.cpp:2276 #12 0x0000fffff57b2fcc in clang::Sema::BuildUsingDeclaration (this=0xaaaaab55cd60, S=, AS=, UsingLoc=..., HasTypenameKeyword=, TypenameLoc=..., SS=..., NameInfo=..., EllipsisLoc=..., AttrList=..., IsInstantiation=false, IsUsingIfExists=false) at /path/to/llvm-project/clang/lib/Sema/SemaDeclCXX.cpp:12821 #13 0x0000fffff57b3e30 in clang::Sema::ActOnUsingDeclaration (this=0xaaaaab55cd60, S=0xaaaaab06e5c0, AS=AS@entry=clang::AS_none, UsingLoc=UsingLoc@entry=..., TypenameLoc=..., SS=..., Name=..., EllipsisLoc=..., AttrList=...) at /path/to/llvm-project/clang/include/clang/Sema/ParsedAttr.h:896 #14 0x0000fffff53e0c64 in clang::Parser::ParseUsingDeclaration (this=0xaaaaab261230, Context=clang::DeclaratorContext::Block, TemplateInfo=..., UsingLoc=..., DeclEnd=..., PrefixAttrs=..., AS=clang::AS_none) at /path/to/llvm-project/clang/include/clang/Sema/Sema.h:14121 #15 0x0000fffff53e1418 in clang::Parser::ParseUsingDirectiveOrDeclaration (this=this@entry=0xaaaaab261230, Context=Context@entry=clang::DeclaratorContext::Block, TemplateInfo=..., DeclEnd=..., Attrs=...) at /path/to/llvm-project/clang/lib/Parse/ParseDeclCXX.cpp:512 #16 0x0000fffff53cdd30 in clang::Parser::ParseDeclaration (this=this@entry=0xaaaaab261230, Context=Context@entry=clang::DeclaratorContext::Block, DeclEnd=..., DeclAttrs=..., DeclSpecAttrs=..., DeclSpecStart=) at /path/to/llvm-project/clang/include/clang/Basic/SourceLocation.h:88 #17 0x0000fffff5445a78 in clang::Parser::ParseStatementOrDeclarationAfterAttributes (this=0xaaaaab261230, Stmts=..., StmtCtx=clang::Parser::ParsedStmtContext::AllowStandaloneOpenMPDirectives, TrailingElseLoc=0x0, CXX11Attrs=..., GNUAttrs=...) at /path/to/llvm-project/clang/lib/Parse/ParseStmt.cpp:244 #18 0x0000fffff5446fd0 in clang::Parser::ParseStatementOrDeclaration (this=0xaaaaab261230, Stmts=..., StmtCtx=clang::Parser::ParsedStmtContext::AllowStandaloneOpenMPDirectives, TrailingElseLoc=0x0) at /path/to/llvm-project/clang/lib/Parse/ParseStmt.cpp:118 #19 0x0000fffff5447a44 in clang::Parser::ParseCompoundStatementBody (this=0xaaaaab261230, isStmtExpr=false) at /path/to/llvm-project/clang/lib/Parse/ParseStmt.cpp:1225 #20 0x0000fffff5448138 in clang::Parser::ParseFunctionStatementBody (this=this@entry=0xaaaaab261230, Decl=Decl@entry=0xaaaaab587bc0, BodyScope=...) at /path/to/llvm-project/clang/lib/Parse/ParseStmt.cpp:2503 #21 0x0000fffff53a7618 in clang::Parser::ParseFunctionDefinition (this=this@entry=0xaaaaab261230, D=..., TemplateInfo=..., LateParsedAttrs=LateParsedAttrs@entry=0xffffffffc6b8) at /path/to/llvm-project/clang/lib/Parse/Parser.cpp:1508 #22 0x0000fffff53ccd4c in clang::Parser::ParseDeclGroup (this=0xaaaaab261230, DS=..., Context=clang::DeclaratorContext::File, Attrs=..., DeclEnd=0x0, FRI=0x0) at /path/to/llvm-project/clang/include/clang/Basic/SourceLocation.h:88 #23 0x0000fffff53a4228 in clang::Parser::ParseDeclarationOrFunctionDefinition (this=0xaaaaab261230, Attrs=..., DeclSpecAttrs=..., DS=, AS=clang::AS_none) at /path/to/llvm-project/clang/lib/Parse/Parser.cpp:1258 #24 0x0000fffff53a9180 in clang::Parser::ParseExternalDeclaration (this=0xaaaaab261230, Attrs=..., DeclSpecAttrs=..., DS=) at /path/to/llvm-project/clang/lib/Parse/Parser.cpp:1062 #25 0x0000fffff53aac4c in clang::Parser::ParseTopLevelDecl (this=this@entry=0xaaaaab261230, Result=..., ImportState=@0xffffffffd87c: clang::Sema::ModuleImportState::NotACXX20Module) at /path/to/llvm-project/clang/lib/Parse/Parser.cpp:755 #26 0x0000fffff539f204 in clang::ParseAST (S=..., PrintStats=PrintStats@entry=false, SkipFunctionBodies=SkipFunctionBodies@entry=false) at /path/to/llvm-project/clang/lib/Parse/ParseAST.cpp:162 #27 0x0000fffff48d3780 in lldb_private::ClangExpressionParser::ParseInternal (this=, diagnostic_manager=..., completion_consumer=, completion_line=, completion_column=) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/unique_ptr.h:469 #28 0x0000fffff48b4d40 in lldb_private::ClangUserExpression::TryParse (this=0xaaaaaaf11030, diagnostic_manager=..., exe_scope=0xaaaaaad1d7e0, exe_ctx=..., execution_policy=lldb_private::eExecutionPolicyOnlyWhenNeeded, keep_result_in_memory=, generate_debug_info=false) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/unique_ptr.h:469 #29 0x0000fffff48b7004 in lldb_private::ClangUserExpression::Parse (this=0xaaaaaaf11030, diagnostic_manager=..., exe_ctx=..., execution_policy=lldb_private::eExecutionPolicyOnlyWhenNeeded, keep_result_in_memory=, generate_debug_info=) at /path/to/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangUserExpression.cpp:678 #30 0x0000fffff3becd00 in lldb_private::UserExpression::Evaluate (exe_ctx=..., options=..., expr=..., prefix=..., result_valobj_sp=..., error=..., fixed_expression=fixed_expression@entry=0xaaaaaabdcd70, ctx_obj=ctx_obj@entry=0x0) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/shared_ptr_base.h:1665 #31 0x0000fffff3d3ce54 in lldb_private::Target::EvaluateExpression (this=this@entry=0xaaaaaac7f2c0, expr=..., exe_scope=exe_scope@entry=0xffffb0101ae0, result_valobj_sp=..., options=..., fixed_expression=fixed_expression@entry=0xaaaaaabdcd70, ctx_obj=ctx_obj@entry=0x0) at /path/to/llvm-project/lldb/source/Target/Target.cpp:2707 #32 0x0000fffff47e095c in lldb_private::CommandObjectExpression::EvaluateExpression (this=0xaaaaaabdc7c0, expr=..., output_stream=..., error_stream=..., result=...) at /path/to/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:439 #33 0x0000fffff47e1b64 in lldb_private::CommandObjectExpression::DoExecute (this=0xaaaaaabdc7c0, command=..., result=...) at /usr/aarch64-linux-gnu/include/c++/13.2.0/ext/atomicity.h:86 #34 0x0000fffff3c2eda8 in lldb_private::CommandObjectRaw::Execute (this=0xaaaaaabdc7c0, args_string=0xffffffffe760 "x.foo()", result=...) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/char_traits.h:409 #35 0x0000fffff3c2b9a0 in lldb_private::CommandInterpreter::HandleCommand (this=0xaaaaaab7e660, command_line=, lazy_add_to_history=, result=..., force_repeat_command=) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/basic_string.h:2583 #36 0x0000fffff3c2bdb4 in lldb_private::CommandInterpreter::IOHandlerInputComplete (this=0xaaaaaab7e660, io_handler=..., line=...) at /usr/aarch64-linux-gnu/include/c++/13.2.0/bits/basic_string.h:2583 #37 0x0000fffff3b5c904 in lldb_private::IOHandlerEditline::Run (this=0xaaaaaac7e680) at /path/to/llvm-project/lldb/source/Core/IOHandler.cpp:600 #38 0x0000fffff3b38778 in lldb_private::Debugger::RunIOHandlers (this=0xaaaaaab3ee10) at /path/to/llvm-project/lldb/source/Core/Debugger.cpp:1076 #39 0x0000fffff3c1d570 in lldb_private::CommandInterpreter::RunCommandInterpreter (this=0xaaaaaab7e660, options=...) at /path/to/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:3376 #40 0x0000fffff39fea1c in lldb::SBDebugger::RunCommandInterpreter (this=0xfffffffff160, auto_handle_events=true, spawn_thread=false) at /path/to/llvm-project/lldb/include/lldb/Interpreter/CommandInterpreter.h:104 #41 0x0000aaaaaaaaf318 in Driver::MainLoop (this=0xfffffffff140) at /path/to/llvm-project/lldb/tools/driver/Driver.cpp:630 #42 0x0000aaaaaaaadf24 in main (argc=, argv=) at /path/to/llvm-project/lldb/tools/driver/Driver.cpp:812 ```

As far as I can see now, such a DW_AT_signing_schema attribute should be enough. Please let me know if it looks reasonable and if there are cases when we might need dwarf expressions with dwarf operations like existing DW_SUB_OP_AARCH64_sign but for stripping/authenticating.

Update. It turns out that we already have DW_TAG_LLVM_ptrauth_type with the following attributes:

DW_AT_LLVM_ptrauth_key
DW_AT_LLVM_ptrauth_address_discriminated
DW_AT_LLVM_ptrauth_extra_discriminator
DW_AT_LLVM_ptrauth_isa_pointer
DW_AT_LLVM_ptrauth_authenticates_null_values

It could be used instead of DW_AT_signing_schema proposed above. The idea remains the same - attach a signing schema to DW_TAG_* entities meant to be signed. We even have some related code already - see https://github.com/access-softek/llvm-project/blob/elf-pauth/lldb/source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp#L310.

access-softek / llvm-project

[PAuth][LLDB] Make signed pointers usable in expressions #61