Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Allow --disassemble-functions to take demangled names #40878

Closed Quuxplusone closed 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41908
Status RESOLVED FIXED
Importance P enhancement
Reported by James Henderson (jh7370.2008@my.bristol.ac.uk)
Reported on 2019-05-16 06:59:09 -0700
Last modified on 2019-06-24 09:18:56 -0700
Version trunk
Hardware PC Windows NT
CC francisvm@yahoo.com, grimar@accesssoftek.com, i@maskray.me, jakehehrlich@google.com, llvm-bugs@lists.llvm.org, rupprecht@google.com, yuanfang.chen@sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

It might be nice for --disassemble-functions to be able to accept demangled names. This is because they are often shorter and in some situations might be easier to copy. I'm not exactly sure how this might work in practice. Simply passing the string and then comparing against the demangled strings of the symbols might be enough (though there's a performance risk here). Additionally, we have to consider how to avoid ambiguity as to whether mangled or demangled names are passed in. One option might be to only allow in demangled names if the --demangle switch is also specified. There may also be other options.

Quuxplusone commented 5 years ago
Could we do both
--disassemble-functions
--disassemble-functions-demangled
with the later taking demangled names?
Quuxplusone commented 5 years ago

I've CC'ed a few extra people who might have their own thoughts on this. As this is a completely new feature, we need to make sure that whatever we do is sensible from a user's perspective.

Having two separate switches would allow users to specify a mixture of the two, I guess.

Quuxplusone commented 5 years ago

Having two separate switches sounds good to me. That would allow to demangle all symbols only when --disassemble-functions-demangled option is used and keep the whole logic simpler probably.

Quuxplusone commented 5 years ago
1) Parens would make it pretty annoying to use, e.g. you'd have to run:
$ llvm-objdump '--disassemble-functions=foo()' foo.o
to avoid the shell messing with (), or
$ llvm-objdump '--disassemble-functions=foo(int, int, int)' foo.o
to avoid the shell misinterpreting spaces.
In the interest of making it easier to use, are you considering some kind of
prefix/substring/fuzzy matching, e.g. being able to run:
$ llvm-objdump --disassemble-functions=foo
and get disassembly for _Z3foov? (And how would that handle overloads?)

2) If we can make some reasonable attempt of figuring out if the requested
symbol is mangled or demangled, the UX may be a lot better if we just had one
flag, and the user doesn't have to remember to use the -demangled variant.
Quuxplusone commented 5 years ago

My anticipation for how this is used is to take the demangled form of a symbol produced by another tool (e.g. the name in a symbol table dump) and paste it in, so 'foo' would only match against a symbol 'foo' without any mangling (e.g. a C symbol). This does bring up the question of what to do if there is ambiguity (IIRC some destructors/constructors produce the same demangling form). In this case, I think we should disassemble all matching symbols. I don't think people are that adverse to quoting arguments when necessary, personally.

There's already precedent in some of the LLVM tools for guessing whether a symbol is mangled or not, namely by looking for an _Z prefix. However, I don't think that helps us here. If we get a C-symbol 'foo' (i.e. there is no mangling involved), then the heuristic would fail and the demangler might try to demangle it as a type. Basically, I don't think we can distinguish the cases sufficiently to avoid running them through the demangler always, so I think the behaviour needs to be configured by a switch. Alternatively, we need an interface in the demangler that refuses to demangle anything except a full name (i.e. no type demangling). This latter approach would be my preferred option, if we went down the single switch route.

I'm inclined to think we should just base the behvaiour on the --demangle switch's presence. If present, try to treat it as a demangled name, if possible. If not, don't. This would then mean that input names are consistent with those in the output.

Quuxplusone commented 5 years ago

I think it's totally intuitive that --demangle decides mangle/unmangle for both input&output function names. But it seems to break one current behaviour (I'm not sure to what extent we care about this, it seems a reasonable use case.)

currently means (mangled input / demangled output).

'--demangle --disassemble-function=_Z3foov'

Simply letting --disassemble-function take both mangled/demangled names possibly at the same time (without depending on --demangle) may have some performance issue.

Having two switches do not lose any flexibility (user decides mangle/demangle intput/output) or performance. The only burden is remembering one more flag.

Quuxplusone commented 5 years ago
Fixed in
llvm-svn: 364121