Open chwebb02 opened 5 months ago
Sorry for my slow response. What version of gRPC are you working with?
No worries, thank you for helping! I am using gRPC 1.57.0 with the most recent version of the Homa Kernel Module running on top of Ubuntu 22.04 LTS with Linux kernel version 6.1.38.
I just tried running test_server on my sources (same versions of everything as you) and it seems to start up fine for me.
Could you compile with debugging (set "DEBUG := yes" in the Makefile), run test_server under gdb, then when it crashes type "where" and send me the gdb output?
On Tue, Jun 18, 2024 at 10:10 AM Chase Webb @.***> wrote:
No worries, thank you for helping! I am using gRPC 1.57.0 with the most recent version of the Homa Kernel Module running on top of Ubuntu 22.04 LTS with Linux kernel version 6.1.38.
— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2176588563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCRRQ6OFYAKLVNQRPBTZIBSZRAVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWGU4DQNJWGM . You are receiving this because you commented.Message ID: @.***>
After setting DEBUG to yes in the Makefile, there was no segmentation fault. However, the problem persists when compiling with DEBUG set to no.
Well that's a bummer! This may be hard to debug...
How about running it under gdb when compiled with DEBUG off and send me a stack trace of the crash? Perhaps that will yield some clues.
On Thu, Jun 20, 2024 at 12:20 PM Chase Webb @.***> wrote:
After setting DEBUG to yes in the Makefile, there was no segmentation fault. However, the problem persists when compiling with DEBUG set to no.
— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2181370942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCQTQVRSDKFMCE7RZV3ZIMTPLAVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGM3TAOJUGI . You are receiving this because you commented.Message ID: @.***>
Sorry for delay. Here is the stack trace using gdb
#0 0x00007ffff7da4d74 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator std::basic_string_view<char, std::char_traits<char> >() const ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x0000555555a4a193 in std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Get<std::basic_string_view<char, std::char_traits<char> > >(std::shared_ptr<grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Node> const&, std::basic_string_view<char, std::char_traits<char> > const&) ()
#2 0x0000555555a48ae5 in grpc_core::ChannelArgs::Value const* grpc_core::AVL<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, grpc_core::ChannelArgs::Value>::Lookup<std::basic_string_view<char, std::char_traits<char> > >(std::basic_string_view<char, std::char_traits<char> > const&) const ()
#3 0x0000555555a4385b in grpc_core::ChannelArgs::Get(std::basic_string_view<char, std::char_traits<char> >) const ()
#4 0x0000555555a44d19 in grpc_core::ChannelArgs::GetString(std::basic_string_view<char, std::char_traits<char> >) const ()
#5 0x0000555555b3ce19 in grpc_core::Channel::Create(char const*, grpc_core::ChannelArgs, grpc_channel_stack_type, grpc_transport*) ()
#6 0x0000555555b5020e in grpc_core::Server::SetupTransport(grpc_transport*, grpc_pollset*, grpc_core::ChannelArgs const&, grpc_core::RefCountedPtr<grpc_core::channelz::SocketNode> const&)
()
#7 0x0000555555624814 in HomaListener::Transport::start(grpc_core::Server*, std::vector<grpc_pollset*, std::allocator<grpc_pollset*> > const*) ()
#8 0x0000555555b500f4 in grpc_core::Server::Start() ()
#9 0x0000555555b55ae0 in grpc_server_start ()
#10 0x00005555558dec1c in grpc::Server::Start(grpc::ServerCompletionQueue**, unsigned long) ()
#11 0x00005555558cede0 in grpc::ServerBuilder::BuildAndStart() ()
#12 0x00005555555ff0a8 in main ()
Thanks for the stack trace (I recently noticed that you already sent this earlier... sorry for making you send it again). It appears that the channel arguments object is somehow getting corrupted. Can you try the following steps?
names.push_back
statements, adjusting the number of statements to reflect the number of arguments (believe it or not, there is no way to actually query the names of the channel arguments at runtime).Hopefully this will help to narrow down the problem a bit.
Sorry for my late response. I am not able to compile with the new homa_listener. I get the following output:
homa_listener.cc: In member function ‘void HomaListener::Transport::start(grpc_core::Server*, const std::vector<grpc_pollset*>*)’:
homa_listener.cc:297:28: error: ‘const class grpc_core::ChannelArgs’ has no member named ‘printNames’
297 | server->channel_args().printNames();
| ^~~~~~~~~~
That's strange; I see the method's definition in the file src/core/lib/channel/channel_args.cc in the grpc directory. Can you (a) see if there is such a method defined in your version of the file and (b) double-check the version of gRPC that you are using (I'm using 1.57.0)? Could you respond back here with the git commit # from which you are compiling your gRPC sources?
Thanks.
-John-
On Tue, Jul 16, 2024 at 1:45 PM Chase Webb @.***> wrote:
Sorry for my late response. I am not able to compile with the new homa_listener. I get the following output: ` homa_listener.cc: In member function ‘void HomaListener::Transport::start(grpc_core::Server, const std::vector<grpc_pollset>*)’: homa_listener.cc:297:28: error: ‘const class grpc_core::ChannelArgs’ has no member named ‘printNames’ 297 | server->channel_args().printNames(); | ^
~~~~~`
— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/15#issuecomment-2231798484, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCVJS4OUQDEGXUQOUCLZMWA77AVCNFSM6AAAAABJE5YF3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRG44TQNBYGQ . You are receiving this because you commented.Message ID: @.***>
I was not able to find the method when searching through the file. I was able to determine that I am using gRPC 1.57.0 and issuing git log resulted in this commit number:
commit a61640173d00b63e0b55ad61915a9b1708e12d27 (grafted, HEAD, tag: v1.57.0)
Author: AJ Heller <hork@google.com>
Date: Tue Aug 8 10:56:15 2023 -0700
[Release] Bump version to 1.57.0 (on v1.57.x branch) (#34008)
Change was created by the release automation script. See go/grpc-release
Oops, sorry, my goof. In looking through my sources I see that I added that method myself for debugging a while ago, but forgot. I'm attaching my version of src/core/lib/channel/channel_args.cc (I had to attach it as a .txt file instead of .cc so that GitHub would let it pass); can you drop that into your grpc tree and build with that to run the test?
I also had to make a change to the appropriate header file to add a declaration of printNames(). After doing so and running test_server I am not receiving any output before the segmentation fault. I am not sure if this is a mistake on my end in configuring it or something else.
After another attempt, the code does not segfault when using the provided channel_args. The output of running test_server is as follows:
chwebb02@vm0:~/grpc_homa$ ./test_server
ChannelArgs name: grpc.compression_enabled_algorithms_bitset
ChannelArgs name: grpc.internal.event_engine
ChannelArgs name: grpc.primary_user_agent
ChannelArgs name: grpc.resource_quota
Printing channel arg grpc.compression_enabled_algorithms_bitset
grpc.compression_enabled_algorithms_bitset is an integer: 7
Printing channel arg grpc.internal.event_engine
grpc.internal.event_engine is a pointer
Printing channel arg grpc.primary_user_agent
grpc.primary_user_agent is a string: grpc-c++/1.57.0
Printing channel arg grpc.resource_quota
grpc.resource_quota is a pointer
Server listening on port 4000
Just double-checking to make sure I understand: the run above (which did not segfault) occurred even when running without debugging? If not, can you run it without debugging?
Yes, the above output is the result from using the provided files and compiling and running without debugging.
Executing test_server results in a segmentation fault created in the execution of the BuildAndStart() method. Below is the stack backtrace produced by GDB.