Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

With 100k+ mangled names, llvm-c++filt fails on a few percent, and disagrees with c++filt for about 15% #43398

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR44428
Status NEW
Importance P enhancement
Reported by Simon Hardy-Francis (simonhf@gmail.com)
Reported on 2020-01-01 13:38:59 -0800
Last modified on 2020-01-30 19:09:37 -0800
Version 9.0
Hardware PC Linux
CC dblaikie@gmail.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Mangled names from a working open source project and steps to reproduce can be found here [1].

P.S. Why is the tool called llvm-cxxfilt in real life, but llvm-c++filt in this bug system?

[1] https://gist.github.com/simonhf/0d60bb94f2d90c1b32e4786b2d1062ad

Quuxplusone commented 4 years ago
Different demanglings are not a bug per se; the goal of both tools is to
produce human-readable names, and there's more than one way to do that. (We
might want to look at the cases where llvm-cxxfilt produces longer demanglings,
though.)

The differing behavior of demangling

_ZN3caf12config_value3setINS_3uriEEENSt9enable_ifIXsrNS_6detail9is_one_ofIT_JdNS_10atom_valueENSt6chrono8durationIlSt5ratioILl1ELl1000000000EEEES2_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS0_SaIS0_EENS_10dictionaryIS0_EEEEE5valueEvE4typeES6_

is definitely a bug in one of the two demanglers. They are registering
different substitutions as they demangle, as follows:

For c++filt:

S_: caf
S0_: caf::config_value
S1_: caf::config_value::set
S2_: caf::uri
S3_: std::enable_if
// ---
S4_: caf::detail
S5_: caf::detail::is_one_of
// ---
S6_: caf::uri
S7_: caf::atom_value
S8_: std::chrono
S9_: std::chrono::duration
SA_: std::ratio
SB_: std::ratio<1l, 1000000000l>
SC_: std::chrono::duration<long, std::ratio<1l, 1000000000l> >
SD_: std::__cxx11
SE_: std::__cxx11::basic_string
SF_: std::char_traits
SG_: std::char_traits<char>
SH_: std::allocator<char>
SI_: std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
SJ_: std::vector
SK_: std::allocator<caf::config_value>
SL_: std::vector<caf::config_value, std::allocator<caf::config_value> >
SM_: caf::dictionary
SN_: caf::dictionary<caf::config_value>
// ---
SO_: caf::detail::is_one_of<caf::uri, double, caf::atom_value,
std::chrono::duration<long, std::ratio<1l, 1000000000l> >, caf::uri,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>, std::vector<caf::config_value, std::allocator<caf::config_value> >,
caf::dictionary<caf::config_value> >
// ---
SP_: std::enable_if<caf::detail::is_one_of<caf::uri, double, caf::atom_value,
std::chrono::duration<long, std::ratio<1l, 1000000000l> >, caf::uri,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>, std::vector<caf::config_value, std::allocator<caf::config_value> >,
caf::dictionary<caf::config_value> >::value, void>
SQ_: std::enable_if<caf::detail::is_one_of<caf::uri, double, caf::atom_value,
std::chrono::duration<long, std::ratio<1l, 1000000000l> >, caf::uri,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>, std::vector<caf::config_value, std::allocator<caf::config_value> >,
caf::dictionary<caf::config_value> >::value, void>::type

For llvm-cxxfilt:

S_: caf
S0_: caf::config_value
S1_: caf::config_value::set
S2_: caf::uri
S3_: std::enable_if
// ---
S4_: caf::uri
S5_: caf::atom_value
S6_: std::chrono
S7_: std::chrono::duration
S8_: std::ratio
S9_: std::ratio<1l, 1000000000l>
SA_: std::chrono::duration<long, std::ratio<1l, 1000000000l> >
SB_: std::__cxx11
SC_: std::__cxx11::basic_string
SD_: std::char_traits
SE_: std::char_traits<char>
SF_: std::allocator<char>
SG_: std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
SH_: std::vector
SI_: std::allocator<caf::config_value>
SJ_: std::vector<caf::config_value, std::allocator<caf::config_value> >
SK_: caf::dictionary
SL_: caf::dictionary<caf::config_value>
// ---
SM_: std::enable_if<caf::detail::is_one_of<caf::uri, double, caf::atom_value,
std::chrono::duration<long, std::ratio<1l, 1000000000l> >, caf::uri,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>, std::vector<caf::config_value, std::allocator<caf::config_value> >,
caf::dictionary<caf::config_value> >::value, void>
SN_: std::enable_if<caf::detail::is_one_of<caf::uri, double, caf::atom_value,
std::chrono::duration<long, std::ratio<1l, 1000000000l> >, caf::uri,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>, std::vector<caf::config_value, std::allocator<caf::config_value> >,
caf::dictionary<caf::config_value> >::value, void>::type

So, c++filt is registering 3 additional substitutions for
srNS_6detail9is_one_ofIT_...:

 * One for NS_6detail
 * One for NS_6detail9is_one_of
 * One for the entire name complete with template arguments

Per the mangling specification (http://itanium-cxx-abi.github.io/cxx-
abi/abi.html), we have:

  <unresolved-name> ::= sr <unresolved-qualifier-level>+ E <base-unresolved-name>
  <unresolved-qualifier-level> ::= <simple-id>
  <base-unresolved-name> ::= <simple-id>
  <simple-id> ::= <source-name> [ <template-args> ]

That is, the names in this 'sr' construct are *not* substitutable. (This is
intentional: they are not resolved names, so don't correspond to symbol table
entries.)

So this is a bug in the GCC mangler and a matching bug in the c++filt
demangler; llvm-cxxfilt's demangling is correct.

llvm-cxxfilt's refusal to demangle

_ZN3caf12actor_system10spawn_implIN6broker6detail11flare_actorELNS_13spawn_optionsE0EJEEENS_23infer_handle_from_classIT_XsrSt10is_base_ofINS_14abstract_actorES7_E5valueEE4typeERNS_12actor_configEDpOT1_

appears to be the same issue -- that is not a valid mangling, again because the
'sr...' construct does not introduce substitutions.
Quuxplusone commented 4 years ago

Thanks for the detailed explanation. I opened a bug here [1] to hopefully address the incorrect behavior of the g++ mangler.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93475

Quuxplusone commented 4 years ago

Thanks.

Regarding the remaining cases in your gist: I think a big factor in the "much larger in llvm-cxxfilt" might be that llvm-cxxfilt has a more verbose demangling for lambda-expressions. It'd be interesting to know if most of the "much larger" cases have "lambda" in their demangling...