cms-gem-daq-project / xhal

XHAL interface library
0 stars 10 forks source link

Exception handling in template RPC calls #125

Closed lmoureaux closed 5 years ago

lmoureaux commented 5 years ago

Description

Handle exceptions thrown by RPC methods and the Wisconsin RPC library. Turn them into two exceptions types at the calling site. For exceptions thrown on the remote side, a stack trace can be recorded and sent back (optional).

This PR adds a .cpp file that should be compiled into its a library. It is required on all architectures. The build system has not been modified to include it.

Example exception info

I triggered an exception by doing .at(-1) on a std::vector. The message is:

remote error: std::out_of_range: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 10)

[feature was dropped] The backtrace as retrieved from the thrown exception is (relevant frames are 5-7):

Stack trace (most recent call last):
#17   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in 
#16   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f22249, in _start
#15   Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f01aa1fdb96, in __libc_start_main
#14   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f228c9, in main
#13   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f22b25, in handle_client_connection(int, sockaddr_in&, unsigned int, int&)
#12   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f23779, in run_client(int)
#11   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f25fb5, in ModuleManager::invoke_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, wisc::RPCMsg*, wisc::RPCMsg*)
#10   Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a5212, in void xhal::rpc::invoke<Memory::Read, 0>(wisc::RPCMsg const*, wisc::RPCMsg*)
#9    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a5aad, in xhal::rpc::compat::void_holder<std::vector<unsigned int, std::allocator<unsigned int> > > xhal::rpc::compat::tuple_apply<std::vector<unsigned int, std::allocator<unsigned int> >, Memory::Read, unsigned int, unsigned int, 0>(Memory::Read&&, std::tuple<unsigned int, unsigned int> const&)
#8    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a674f, in decltype (((forward<Memory::Read>)({parm#1}))((get<0ul>)({parm#2}), (get<1ul>)({parm#2}))) xhal::rpc::compat::tuple_apply_impl<Memory::Read, unsigned int, unsigned int, 0ul, 1ul>(Memory::Read&&, std::tuple<unsigned int, unsigned int> const&, xhal::rpc::compat::index_sequence<0ul, 1ul>)
#7    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a34ed, in Memory::Read::operator()(unsigned int, unsigned int) const
#6    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a4a22, in std::vector<unsigned int, std::allocator<unsigned int> >::at(unsigned long)
#5    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a50d3, in std::vector<unsigned int, std::allocator<unsigned int> >::_M_range_check(unsigned long) const
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f01aa873854, in 
#3    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b7a3e, in __cxa_throw
#2    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b789d, in xhal::rpc::helper::recordStackTrace()
#1    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b834d, in backward::StackTraceImpl<backward::system_tag::linux_tag>::load_here(unsigned long)
#0    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b9d15, in unsigned long backward::details::unwind<backward::StackTraceImpl<backward::system_tag::linux_tag>::callback>(backward::StackTraceImpl<backward::system_tag::linux_tag>::callback, unsigned long)

Line numbers and even code snippets could be obtained by linking against libdwarf

How it works

Exceptions are caught on the server side (CTP7, in invoke) and an error message is generated there. The exact type of the exception is determined using a low-level C++ API (defined by the Itanium ABI). This information is sent back to the caller using the error and type keys. The call function checks for the keys and throws an exception with the relevant information.

In addition, code in the call function is now protected against exceptions. Any error coming from the Wisconsin RPC library results in an exception being thrown (a single type is used).

[feature was dropped] The optional stack trace functionality works by inserting a hook in the low-level API used by throw statements. More specifically, it overrides __cxa_throw (also from the Itanium ABI) to record a stack trace every time an exception is thrown. When an exception is caught in invoke, the stack trace is queried and sent back to the caller.

Since recording a stack trace for every exception adds some overhead, it is made optional using a preprocessor macro. If NDEBUG is set when register.h is included, no trace will be recorded. (I'm not sure whether this applies at the level of a module, a compilation unit or even the part of it after the #include. This would need to be tested.)

CTP7 warning

[feature was dropped, irrelevant] The version of libstdcxx on the CTP7 has a bug that prevents getting a stack trace. It also fools gdb. The best option seems to ask Wisconsin for an image based on a more recent Petalinux. We could also provide an updated version of libstdcxx to LD_PRELOAD when starting wiscrpcsvc, or a tiny library that would only patch the buggy function.

The code was tested primarily on x86, with some testing on a more recent ARM Linux.

Types of changes

Motivation and Context

We want to catch exceptions thrown by RPC modules, and provide devs with as much debugging information as possible.

How Has This Been Tested?

All prototyping was done on x86. Testing on ARM (CTP7 and another chip) was performed to check for architecture-dependent issues. This polished version was tested on x86.

The following client was used with an RPC-ified memory:

#include "memory.h"

#include <iomanip>
#include <iostream>

#include "xhal/rpc/call.h"

int main(int argc, char **argv)
{
    try
    {
        wisc::RPCSvc conn;
        conn.connect("localhost");

        conn.load_module("memory", "memory v1.0.1");

        auto mem = xhal::rpc::call<Memory::Read>(conn, 0, 10);
        std::cout << std::hex;
        for (std::uint32_t word : mem) {
            std::cout << " " << word;
        }
        std::cout << std::endl;
    }
    catch(const xhal::rpc::RemoteException &e)
    {
        std::cerr << "Remote call failed: " << e.what() << std::endl;
        if (e.hasTrace()) {
            std::cerr << e.trace() << std::endl;
        }
        return 1;
    }
    catch(const xhal::rpc::MessageException &e)
    {
        std::cerr << "Remote call failed: " << e.what() << std::endl;
        return 1;
    }

    return 0;
}

Compilation was tested on gem904daq01 with the Xilinx SDK 2016.2.

Checklist:

mexanick commented 5 years ago

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done... Also, shall we extend the exceptions propagation to include warnings alongside the errors?

jsturdy commented 5 years ago

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

Also, shall we extend the exceptions propagation to include warnings alongside the errors?

I am really not in favour of using exceptions as anything other than exceptions... Can you give an example of something you would consider a "warning" that you'd like to expose in this way?

jsturdy commented 5 years ago

One concern/question I have is why do we need something that seemingly has a lot of added code and overhead, when at a minimum, I think what we need is just something that can: 1) catch the exception in the invoke method running on the CTP7 1) translate this message into an error key 1) throw the exception (in the event of the presence of the error key) on the receiving end in the call method

At this point, the code that executed the call to the remote function simply handles the exception as it normally would.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

What is the use case for the complicated design pattern?

jsturdy commented 5 years ago

I'll also rebase this on top of the recently merged #123, unless @lmoureaux is back from holiday, as I'd have to force push to your github

lmoureaux commented 5 years ago

Rebased on top of feature/templated-rpc-methods; implemented Jared's comment; fixed an issue with exceptions from the standard library.

Will look into build system integration next.

lmoureaux commented 5 years ago

Also added an example backtrace to the summary above.

lpetre-ulb commented 5 years ago

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

As told in the first post, the issue lies in libstdcxx and even prevents gdb (via gdbserver) to get full backtraces. The issue is documented and fixed upstream. Our minimal requirement would then be to upgrade GCC from version 4.9.2 to version 4.9.3, even though a more recent version would definitively be better.

About CC7, do you know if an ARM release is provided? If yes, is it fully compatible with the kernel provided by Xilinx. If no, a more recent PetaLinux releases with more mainstream technology (such as overlay filesystems) would help.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

I fear defining more exception types would defeat the abstraction implemented in the templated RPC framework. If they were defined here, it would increase the difficulty to create standalone tools and if they were defined in the ctp7_modules, the templated RPC would have to follow all changes in the ctp7_modules. The only abstracted way I currently see is to register the exception classes via a template mechanism.

jsturdy commented 5 years ago

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

As told in the first post, the issue lies in libstdcxx and even prevents gdb (via gdbserver) to get full backtraces. The issue is documented and fixed upstream. Our minimal requirement would then be to upgrade GCC from version 4.9.2 to version 4.9.3, even though a more recent version would definitively be better.

This is probably possible, but the primary issue might be that UW are not really supporting CTP7s, and have moved resources to future boards, so they really only provide some limited support. Again, it can't hurt to ask, and for something like this, I'd be optimistic (unlike say for the inclusion of a version with overlayfs...)

About CC7, do you know if an ARM release is provided? If yes, is it fully compatible with the kernel provided by Xilinx. If no, a more recent PetaLinux releases with more mainstream technology (such as overlay filesystems) would help.

I only know that there are other groups working on LHC projects who are putting centos variants on things like the Zynq. I believe that there is also some work being done to determine if an officially supported CERN release would be feasible.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

I fear defining more exception types would defeat the abstraction implemented in the templated RPC framework. If they were defined here, it would increase the difficulty to create standalone tools and if they were defined in the ctp7_modules, the templated RPC would have to follow all changes in the ctp7_modules. The only abstracted way I currently see is to register the exception classes via a template mechanism.

To be clear, I'm not saying we should have arbitrary exceptions, rather, specific named exceptions inheriting from these two basic types (even on-the-fly exception creation macros, a la xcept::Exception from xdaq), which is potentially interesting from the handling point of view; however, if any remote exception is going to trigger some error condition, this may be a moot point...

jsturdy commented 5 years ago

Rebased on top of feature/templated-rpc-methods; implemented Jared's comment; fixed an issue with exceptions from the standard library.

Will look into build system integration next.

@lmoureaux, you've rebased on top of the wrong branch:

* b006432        (louis/feature/templated-rpc-methods-exceptions) Drop boolean fields in RemoteException
* 02836fd        Add code to record the stack trace on throw
* a60eccb        Update \author info
* 82b6b08        Add support for sending back remote stack traces
* 69ce392        Handle RPCMsg and RPCSvc exceptions in call()
* eb5ea95        Handle the error key in call()
* 198386d        Catch exceptions thrown by RPC methods
| * 1c95eb5  (HEAD -> feature/templated-rpc-methods-exceptions) Add code to record the stack trace on throw
| * 4407b02  Update \author info
| * f25ec24  Add support for sending back remote stack traces
| * c690806  Handle RPCMsg and RPCSvc exceptions in call()
| * b190de8  Handle the error key in call()
| * dc54e5d  Catch exceptions thrown by RPC methods
| *   a4d188c        (gemdaq/feature/templated-rpc-methods, feature/templated-rpc-methods) Merge pull request #123 from lpetre-ulb/feature/templated-rpc-methods
| |\
| |/
|/|
* | dba680f  (louis/feature/templated-rpc-methods, laurent/feature/templated-rpc-methods) Relax namespace constraints for custom type serializers
* | 946f5e5  Add support for custom types
* | e259268  Add support for `std::map<std:string, T>`
* | 66fbbbf  Add support of `std::map<std::uint32_t, T>`
* | 2bd6734  Add support for `std::array` of integral types
| |
|/
*   32b6094  Merge pull request #121 from lpetre-ulb/feature/templated-rpc-methods
lmoureaux commented 5 years ago

Still no build system integration.

jsturdy commented 5 years ago

Still no build system integration.

Blocker

lmoureaux commented 5 years ago

Added build system integration. Symbols are added to libxhal.so. Proof:

[lmoureau@gem904daq01 xhal]$ nm -a xhalcore/lib/libxhal.so | c++filt | grep ::getExc
00000000000343fb T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::TypeException const&)
00000000000343ce T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BadKeyException const&)
0000000000034462 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BufferTooSmallException const&)
00000000000344c9 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::CorruptMessageException const&)
00000000000344f6 T xhal::rpc::helper::getExceptionMessage(wisc::RPCSvc::RPCException const&)
0000000000034351 T xhal::rpc::helper::getExceptionMessage(std::exception const&)
[lmoureau@gem904daq01 xhal]$ nm -a xhalarm/lib/libxhal.so | c++filt | grep ::getExc
00024580 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::TypeException const&)
00024548 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BadKeyException const&)
000245e8 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BufferTooSmallException const&)
00024650 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::CorruptMessageException const&)
00024688 T xhal::rpc::helper::getExceptionMessage(wisc::RPCSvc::RPCException const&)
000244d0 T xhal::rpc::helper::getExceptionMessage(std::exception const&)