PlatformLab / grpc_homa

Allows Homa to be used as a transport with gRPC.
25 stars 5 forks source link

Attempting to create a Homa server results in a Segmentation Fault #15

Open chwebb02 opened 1 month ago

chwebb02 commented 1 month ago

Executing test_server results in a segmentation fault created in the execution of the BuildAndStart() method. Below is the stack backtrace produced by GDB.

(gdb) bt
#0  0x00007ffff7da4d74 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator std::basic_string_view<char, std::char_traits<char> >() const ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000555555a4a193 in std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Get<std::basic_string_view<char, std::char_traits<char> > >(std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> const&, std::basic_string_view<char, std::char_traits<char> > const&) ()
#2  0x0000555555a48ae5 in grpc_core::ChannelArgs::Value const* grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Lookup<std::basic_string_view<char, std::char_traits<char> > >(std::basic_string_view<char, std::char_traits<char> > const&) const ()
#3  0x0000555555a4385b in grpc_core::ChannelArgs::Get(std::basic_string_view<char, std::char_traits<char> >) const ()
#4  0x0000555555a44d19 in grpc_core::ChannelArgs::GetString(std::basic_string_view<char, std::char_traits<char> >) const ()
#5  0x0000555555b3ce19 in grpc_core::Channel::Create(char const*, grpc_core::ChannelArgs, grpc_channel_stack_type, grpc_transport*) ()
#6  0x0000555555b5020e in grpc_core::Server::SetupTransport(grpc_transport*, grpc_pollset*, grpc_core::ChannelArgs const&, grpc_core::RefCountedPtr<grpc_core::channelz::SocketNode> const&)
    ()
#7  0x0000555555624814 in HomaListener::Transport::start (this=0x555556aded90, server=0x555556aea8f0, pollsets=0x555556aea940) at ../grpc/src/core/lib/surface/server.h:130
#8  0x0000555555b500f4 in grpc_core::Server::Start() ()
#9  0x0000555555b55ae0 in grpc_server_start ()
#10 0x00005555558dec1c in grpc::Server::Start(grpc::ServerCompletionQueue**, unsigned long) ()
#11 0x00005555558cede0 in grpc::ServerBuilder::BuildAndStart() ()
#12 0x00005555555ff0a8 in main (argc=<optimized out>, argv=<optimized out>) at test_server.cc:127
johnousterhout commented 1 month ago

Sorry for my slow response. What version of gRPC are you working with?

chwebb02 commented 1 month ago

No worries, thank you for helping! I am using gRPC 1.57.0 with the most recent version of the Homa Kernel Module running on top of Ubuntu 22.04 LTS with Linux kernel version 6.1.38.

johnousterhout commented 1 month ago

I just tried running test_server on my sources (same versions of everything as you) and it seems to start up fine for me.

Could you compile with debugging (set "DEBUG := yes" in the Makefile), run test_server under gdb, then when it crashes type "where" and send me the gdb output?

On Tue, Jun 18, 2024 at 10:10 AM Chase Webb @.***> wrote:

No worries, thank you for helping! I am using gRPC 1.57.0 with the most recent version of the Homa Kernel Module running on top of Ubuntu 22.04 LTS with Linux kernel version 6.1.38.

— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2176588563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCRRQ6OFYAKLVNQRPBTZIBSZRAVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWGU4DQNJWGM . You are receiving this because you commented.Message ID: @.***>

chwebb02 commented 1 month ago

After setting DEBUG to yes in the Makefile, there was no segmentation fault. However, the problem persists when compiling with DEBUG set to no.

johnousterhout commented 1 month ago

Well that's a bummer! This may be hard to debug...

How about running it under gdb when compiled with DEBUG off and send me a stack trace of the crash? Perhaps that will yield some clues.

On Thu, Jun 20, 2024 at 12:20 PM Chase Webb @.***> wrote:

After setting DEBUG to yes in the Makefile, there was no segmentation fault. However, the problem persists when compiling with DEBUG set to no.

— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2181370942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCQTQVRSDKFMCE7RZV3ZIMTPLAVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGM3TAOJUGI . You are receiving this because you commented.Message ID: @.***>

chwebb02 commented 3 weeks ago

Sorry for delay. Here is the stack trace using gdb


#0  0x00007ffff7da4d74 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator std::basic_string_view<char, std::char_traits<char> >() const ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000555555a4a193 in std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Get<std::basic_string_view<char, std::char_traits<char> > >(std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> const&, std::basic_string_view<char, std::char_traits<char> > const&) ()
#2  0x0000555555a48ae5 in grpc_core::ChannelArgs::Value const* grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Lookup<std::basic_string_view<char, std::char_traits<char> > >(std::basic_string_view<char, std::char_traits<char> > const&) const ()
#3  0x0000555555a4385b in grpc_core::ChannelArgs::Get(std::basic_string_view<char, std::char_traits<char> >) const ()
#4  0x0000555555a44d19 in grpc_core::ChannelArgs::GetString(std::basic_string_view<char, std::char_traits<char> >) const ()
#5  0x0000555555b3ce19 in grpc_core::Channel::Create(char const*, grpc_core::ChannelArgs, grpc_channel_stack_type, grpc_transport*) ()
#6  0x0000555555b5020e in grpc_core::Server::SetupTransport(grpc_transport*, grpc_pollset*, grpc_core::ChannelArgs const&, grpc_core::RefCountedPtr<grpc_core::channelz::SocketNode> const&)
    ()
#7  0x0000555555624814 in HomaListener::Transport::start(grpc_core::Server*, std::vector<grpc_pollset*, std::allocator<grpc_pollset*> > const*) ()
#8  0x0000555555b500f4 in grpc_core::Server::Start() ()
#9  0x0000555555b55ae0 in grpc_server_start ()
#10 0x00005555558dec1c in grpc::Server::Start(grpc::ServerCompletionQueue**, unsigned long) ()
#11 0x00005555558cede0 in grpc::ServerBuilder::BuildAndStart() ()
#12 0x00005555555ff0a8 in main ()
johnousterhout commented 2 weeks ago

Thanks for the stack trace (I recently noticed that you already sent this earlier... sorry for making you send it again). It appears that the channel arguments object is somehow getting corrupted. Can you try the following steps?

Hopefully this will help to narrow down the problem a bit.

homa_listener.txt

chwebb02 commented 5 days ago

Sorry for my late response. I am not able to compile with the new homa_listener. I get the following output:

homa_listener.cc: In member function ‘void HomaListener::Transport::start(grpc_core::Server*, const std::vector<grpc_pollset*>*)’:
homa_listener.cc:297:28: error: ‘const class grpc_core::ChannelArgs’ has no member named ‘printNames’
  297 |     server->channel_args().printNames();
      |                            ^~~~~~~~~~
johnousterhout commented 5 days ago

That's strange; I see the method's definition in the file src/core/lib/channel/channel_args.cc in the grpc directory. Can you (a) see if there is such a method defined in your version of the file and (b) double-check the version of gRPC that you are using (I'm using 1.57.0)? Could you respond back here with the git commit # from which you are compiling your gRPC sources?

Thanks.

-John-

On Tue, Jul 16, 2024 at 1:45 PM Chase Webb @.***> wrote:

Sorry for my late response. I am not able to compile with the new homa_listener. I get the following output: ` homa_listener.cc: In member function ‘void HomaListener::Transport::start(grpc_core::Server, const std::vector<grpc_pollset>*)’: homa_listener.cc:297:28: error: ‘const class grpc_core::ChannelArgs’ has no member named ‘printNames’ 297 | server->channel_args().printNames(); | ^~~~~~

`

— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2231798484, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCVJS4OUQDEGXUQOUCLZMWA77AVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRG44TQNBYGQ . You are receiving this because you commented.Message ID: @.***>

chwebb02 commented 5 days ago

I was not able to find the method when searching through the file. I was able to determine that I am using gRPC 1.57.0 and issuing git log resulted in this commit number:

commit a61640173d00b63e0b55ad61915a9b1708e12d27 (grafted, HEAD, tag: v1.57.0)
Author: AJ Heller <hork@google.com>
Date:   Tue Aug 8 10:56:15 2023 -0700

    [Release] Bump version to 1.57.0 (on v1.57.x branch) (#34008)

    Change was created by the release automation script. See go/grpc-release
johnousterhout commented 5 days ago

Oops, sorry, my goof. In looking through my sources I see that I added that method myself for debugging a while ago, but forgot. I'm attaching my version of src/core/lib/channel/channel_args.cc (I had to attach it as a .txt file instead of .cc so that GitHub would let it pass); can you drop that into your grpc tree and build with that to run the test?

channel_args.txt

chwebb02 commented 3 days ago

I also had to make a change to the appropriate header file to add a declaration of printNames(). After doing so and running test_server I am not receiving any output before the segmentation fault. I am not sure if this is a mistake on my end in configuring it or something else.

chwebb02 commented 3 days ago

After another attempt, the code does not segfault when using the provided channel_args. The output of running test_server is as follows:

chwebb02@vm0:~/grpc_homa$ ./test_server 
ChannelArgs name: grpc.compression_enabled_algorithms_bitset
ChannelArgs name: grpc.internal.event_engine
ChannelArgs name: grpc.primary_user_agent
ChannelArgs name: grpc.resource_quota
Printing channel arg grpc.compression_enabled_algorithms_bitset
grpc.compression_enabled_algorithms_bitset is an integer: 7
Printing channel arg grpc.internal.event_engine
grpc.internal.event_engine is a pointer
Printing channel arg grpc.primary_user_agent
grpc.primary_user_agent is a string: grpc-c++/1.57.0
Printing channel arg grpc.resource_quota
grpc.resource_quota is a pointer
Server listening on port 4000