PlatformLab / grpc_homa

Allows Homa to be used as a transport with gRPC.
25 stars 5 forks source link

Ignoring message for unknown RPC, stream id 9 #9

Closed mikehb closed 11 months ago

mikehb commented 11 months ago

After following the guide of grpc_homa to build my first trial, the client got broken.

The server was listening on 192.168.93.130:2527

E1007 11:02:28.856005133   61479 homa_client.cc:444]                   Ignoring message for unknown RPC, stream id 9

E1007 11:02:38.877597347   61482 homa_client.cc:444]                   Ignoring message for unknown RPC, stream id 14
E1007 11:02:58.898714862   61485 homa_incoming.cc:185]                 Error in recvmsg (homaId 11086): Bad address
E1007 11:02:58.898815860   61485 homa_incoming.cc:185]                 Error in recvmsg (homaId 11088): Bad address

E1007 11:03:28.929306268   61484 homa_incoming.cc:185]                 Error in recvmsg (homaId 11182): Bad address
E1007 11:03:48.952721662   61484 homa_client.cc:444]                   Ignoring message for unknown RPC, stream id 13
E1007 11:03:58.962604838   61488 homa_incoming.cc:185]                 Error in recvmsg (homaId 11262): Bad address
E1007 11:03:58.962703836   61488 homa_incoming.cc:185]                 Error in recvmsg (homaId 0): Bad file descriptor
------ STACK TRACE ------
Backtrace:
/lib64/libc.so.6(+0x3cb20)[0x7fa06ac5fb20]
./bazel-bin/rtk_engine(_ZN9grpc_core13LockfreeEvent8NotifyOnEP12grpc_closure+0x15)[0x2d4393b]
./bazel-bin/rtk_engine[0x2d40760]
./bazel-bin/rtk_engine(_Z22grpc_fd_notify_on_readP7grpc_fdP12grpc_closure+0x2b)[0x2d3891b]
./bazel-bin/rtk_engine(_ZN10HomaClient6onReadEPvN4absl12lts_202301256StatusE+0x3ca)[0x24c4d86]
./bazel-bin/rtk_engine[0x2e1227b]
./bazel-bin/rtk_engine(_ZN9grpc_core7ExecCtx5FlushEv+0x73)[0x2e123a9]
./bazel-bin/rtk_engine[0x2d41ecc]
./bazel-bin/rtk_engine[0x2d423da]
./bazel-bin/rtk_engine[0x2d38bea]
./bazel-bin/rtk_engine(_Z17grpc_pollset_workP12grpc_pollsetPP19grpc_pollset_workerN9grpc_core9TimestampE+0x3c)[0x2d378e6]
./bazel-bin/rtk_engine[0x2cd5789]
./bazel-bin/rtk_engine(grpc_completion_queue_next+0x46)[0x2cd5dea]
./bazel-bin/rtk_engine[0x24ff1e7]
./bazel-bin/rtk_engine[0x24ff2d1]
./bazel-bin/rtk_engine[0x2e37254]
./bazel-bin/rtk_engine[0x2e3729d]
/lib64/libc.so.6(+0x8b15d)[0x7fa06acae15d]
/lib64/libc.so.6(+0x10cc00)[0x7fa06ad2fc00]
Segmentation fault (core dumped)

And the core dump,

[mikehuang@bogon rtk_engine]$ gdb bazel-bin/rtk_engine -c core.61427
GNU gdb (GDB) Fedora Linux 13.2-2.fc37
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bazel-bin/rtk_engine...
(gdb) bt
#0  0x0000000002d4393b in grpc_core::LockfreeEvent::NotifyOn(grpc_closure*) ()
#1  0x0000000002d40760 in fd_notify_on_read(grpc_fd*, grpc_closure*) ()
#2  0x0000000002d3891b in grpc_fd_notify_on_read(grpc_fd*, grpc_closure*) ()
#3  0x00000000024c4d86 in HomaClient::onRead(void*, absl::lts_20230125::Status) ()
#4  0x0000000002e1227b in exec_ctx_run(grpc_closure*) ()
#5  0x0000000002e123a9 in grpc_core::ExecCtx::Flush() ()
#6  0x0000000002d41ecc in end_worker(grpc_pollset*, grpc_pollset_worker*, grpc_pollset_worker**) ()
#7  0x0000000002d423da in pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) ()
#8  0x0000000002d38bea in pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) ()
#9  0x0000000002d378e6 in grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) ()
#10 0x0000000002cd5789 in cq_next(grpc_completion_queue*, gpr_timespec, void*) ()
#11 0x0000000002cd5dea in grpc_completion_queue_next ()
#12 0x00000000024ff1e7 in grpc::(anonymous namespace)::CallbackAlternativeCQ::Ref()::{lambda(void*)#1}::operator()(void*) const ()
#13 0x00000000024ff2d1 in grpc::(anonymous namespace)::CallbackAlternativeCQ::Ref()::{lambda(void*)#1}::_FUN(void*) ()
#14 0x0000000002e37254 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::operator()(void*) const ()
#15 0x0000000002e3729d in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::_FUN(void*) ()
#16 0x00007fa06acae15d in start_thread () from /lib64/libc.so.6
#17 0x00007fa06ad2fc00 in clone3 () from /lib64/libc.so.6
(gdb)
mikehb commented 11 months ago

I found "Unable to parse request" on server side

I1008 10:52:09.976872133   18857 homa_incoming.cc:232]                 Received Homa message from 192.168.93.130:32894, stream id 1, sequence 1 with homaId 12485083, 67 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0xf, bpage[0] 0
I1008 10:52:09.977013455   18857 call.cc:1531]                         ops[0]: RECV_INITIAL_METADATA ptr=0x7fafd801a070
I1008 10:52:09.977117264   18857 call.cc:1000]                         OP[server:0x7fafd8019f20]:  RECV_INITIAL_METADATA
I1008 10:52:09.977133315   18857 channel_stack.cc:256]                 OP[message_size:0x7fafd8019f38]:  RECV_INITIAL_METADATA
I1008 10:52:09.977149714   18857 promise_based_filter.cc:2035]         SVR[message_size:0x7fafd8019f38] StartBatch: have_promise=false recv_initial_state=INITIAL send_trailing_state=INITIAL captured={}
I1008 10:52:09.977160489   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  RECV_INITIAL_METADATA
I1008 10:52:09.977170244   18857 channel_stack.cc:256]                 OP[deadline:0x7fafd8019f50]:  RECV_INITIAL_METADATA
I1008 10:52:09.977176709   18857 channel_stack.cc:256]                 OP[server_call_tracer:0x7fafd8019f68]:  RECV_INITIAL_METADATA
I1008 10:52:09.977189073   18857 promise_based_filter.cc:2035]         SVR[server_call_tracer:0x7fafd8019f68] StartBatch: have_promise=false recv_initial_state=INITIAL send_trailing_state=INITIAL captured={} send_initial_metadata=INITIAL
I1008 10:52:09.977217303   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  RECV_INITIAL_METADATA
I1008 10:52:09.977240117   18857 channel_stack.cc:256]                 OP[connected:0x7fafd8019f80]:  RECV_INITIAL_METADATA
I1008 10:52:09.977260932   18857 util.cc:27]                           Incoming initial metadata: :path -> /rtk.Rtk/WatchEph
I1008 10:52:09.977282594   18857 util.cc:27]                           Incoming initial metadata: :authority -> 192.168.93.130:2527
I1008 10:52:09.977303159   18857 promise_based_filter.cc:2344]         SVR[server_call_tracer:0x7fafd8019f68]: RecvInitialMetadataReady OK
I1008 10:52:09.977320732   18857 promise_based_filter.cc:2409]         SVR[server_call_tracer:0x7fafd8019f68]: WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} send_initial_metadata=GOT_PIPE
I1008 10:52:09.977330826   18857 promise_based_filter.cc:2294]         SVR[server_call_tracer:0x7fafd8019f68] PollTrailingMetadata: INITIAL
I1008 10:52:09.977338701   18857 promise_based_filter.cc:2485]         SVR[server_call_tracer:0x7fafd8019f68]: WakeInsideCombiner poll=<<pending>>; send_initial_metadata=GOT_PIPE send_trailing_metadata=INITIAL
I1008 10:52:09.977352345   18857 promise_based_filter.cc:2344]         SVR[message_size:0x7fafd8019f38]: RecvInitialMetadataReady OK
I1008 10:52:09.977364453   18857 promise_based_filter.cc:368]          SVR[message_size:0x7fafd8019f38] SendMessage.GotPipe st=INITIAL
I1008 10:52:09.977371810   18857 promise_based_filter.cc:664]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.GotPipe st=INITIAL
I1008 10:52:09.977383006   18857 promise_based_filter.cc:2409]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={}
I1008 10:52:09.977391539   18857 promise_based_filter.cc:490]          SVR[message_size:0x7fafd8019f38] SendMessage.WakeInsideCombiner st=IDLE
D1008 10:52:09.977401287   18857 promise_based_filter.cc:2455]         SVR[message_size:0x7fafd8019f38]: After send_message WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} is_idle=true is_forwarded=false
I1008 10:52:09.977415720   18857 promise_based_filter.cc:798]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.WakeInsideCombiner st=IDLE push?=no next?=no allow_push_to_pipe=yes
I1008 10:52:09.977424803   18857 promise_based_filter.cc:2294]         SVR[message_size:0x7fafd8019f38] PollTrailingMetadata: INITIAL
I1008 10:52:09.977452061   18857 promise_based_filter.cc:2485]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner poll=<<pending>>; send_initial_metadata=null send_trailing_metadata=INITIAL
I1008 10:52:09.977498846   18857 call.cc:1531]                         ops[0]: RECV_MESSAGE ptr=0x7fafd801a020
I1008 10:52:09.977520465   18857 call.cc:1000]                         OP[server:0x7fafd8019f20]:  RECV_MESSAGE
I1008 10:52:09.977527517   18857 channel_stack.cc:256]                 OP[message_size:0x7fafd8019f38]:  RECV_MESSAGE
I1008 10:52:09.977536028   18857 promise_based_filter.cc:2035]         SVR[message_size:0x7fafd8019f38] StartBatch: have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={}
I1008 10:52:09.977544105   18857 promise_based_filter.cc:623]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.StartOp st=IDLE
I1008 10:52:09.977552484   18857 promise_based_filter.cc:2409]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={}
I1008 10:52:09.977560543   18857 promise_based_filter.cc:490]          SVR[message_size:0x7fafd8019f38] SendMessage.WakeInsideCombiner st=IDLE
D1008 10:52:09.977568213   18857 promise_based_filter.cc:2455]         SVR[message_size:0x7fafd8019f38]: After send_message WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} is_idle=true is_forwarded=false
I1008 10:52:09.977577012   18857 promise_based_filter.cc:798]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.WakeInsideCombiner st=FORWARDED_BATCH push?=no next?=no allow_push_to_pipe=yes
I1008 10:52:09.977588362   18857 promise_based_filter.cc:2294]         SVR[message_size:0x7fafd8019f38] PollTrailingMetadata: INITIAL
I1008 10:52:09.977596445   18857 promise_based_filter.cc:2485]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner poll=<<pending>>; send_initial_metadata=null send_trailing_metadata=INITIAL
I1008 10:52:09.977608913   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  RECV_MESSAGE
I1008 10:52:09.977615204   18857 channel_stack.cc:256]                 OP[deadline:0x7fafd8019f50]:  RECV_MESSAGE
I1008 10:52:09.977621015   18857 channel_stack.cc:256]                 OP[server_call_tracer:0x7fafd8019f68]:  RECV_MESSAGE
I1008 10:52:09.977628772   18857 promise_based_filter.cc:2035]         SVR[server_call_tracer:0x7fafd8019f68] StartBatch: have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} send_initial_metadata=GOT_PIPE
I1008 10:52:09.977636503   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  RECV_MESSAGE
I1008 10:52:09.977645298   18857 channel_stack.cc:256]                 OP[connected:0x7fafd8019f80]:  RECV_MESSAGE
I1008 10:52:09.977654984   18857 homa_stream.cc:523]                   Invoked message closure for stream id 1
I1008 10:52:09.977685700   18857 promise_based_filter.cc:700]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.OnComplete st=FORWARDED_BATCH status=OK
I1008 10:52:09.977711556   18857 promise_based_filter.cc:2409]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={}
I1008 10:52:09.977720113   18857 promise_based_filter.cc:490]          SVR[message_size:0x7fafd8019f38] SendMessage.WakeInsideCombiner st=IDLE
D1008 10:52:09.977728059   18857 promise_based_filter.cc:2455]         SVR[message_size:0x7fafd8019f38]: After send_message WakeInsideCombiner have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} is_idle=true is_forwarded=false
I1008 10:52:09.977737512   18857 promise_based_filter.cc:798]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.WakeInsideCombiner st=BATCH_COMPLETED push?=no next?=no allow_push_to_pipe=yes
I1008 10:52:09.977749887   18857 promise_based_filter.cc:2294]         SVR[message_size:0x7fafd8019f38] PollTrailingMetadata: INITIAL
I1008 10:52:09.977759843   18857 promise_based_filter.cc:2485]         SVR[message_size:0x7fafd8019f38]: WakeInsideCombiner poll=<<pending>>; send_initial_metadata=null send_trailing_metadata=INITIAL
I1008 10:52:09.977781637   18857 homa_listener.cc:399]                 HomaListener::Transport::set_pollset invoked
I1008 10:52:09.977815948   18857 completion_queue.cc:704]              cq_end_op_for_next(cq=0x7fafd8000f60, tag=0x7fafd80174c0, error=OK, done=0x2b7ed60, done_arg=0x7fafd8017660, storage=0x7fafd8017688)
I1008 10:52:09.977865207   18857 completion_queue.cc:1066]             RETURN_EVENT[0x7fafd8000f60]: OP_COMPLETE: tag:0x7fafd80174c0 OK
I1008 10:52:09.977887653   18857 call.cc:3548]                         grpc_call_cancel_with_status(c=0x7fafd8018fd0, status=13, description=Unable to parse request, reserved=(nil))
I1008 10:52:09.978092612   18857 call.cc:1000]                         OP[server:0x7fafd8019f20]:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978225567   18857 channel_stack.cc:256]                 OP[message_size:0x7fafd8019f38]:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978263765   18857 promise_based_filter.cc:2035]         SVR[message_size:0x7fafd8019f38] StartBatch: have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={}
D1008 10:52:09.978296683   18857 promise_based_filter.cc:2169]         SVR[message_size:0x7fafd8019f38]ServerCallData::Completed: send_trailing_state=INITIAL send_initial_state=null error=INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1008 10:52:09.978410640   18857 promise_based_filter.cc:447]          SVR[message_size:0x7fafd8019f38] SendMessage.Done st=IDLE md=grpc-message: Unable to parse request, grpc-status: 13
I1008 10:52:09.978484878   18857 promise_based_filter.cc:741]          SVR[message_size:0x7fafd8019f38] ReceiveMessage.Done st=CANCELLED md=grpc-message: Unable to parse request, grpc-status: 13
I1008 10:52:09.978589861   18857 promise_based_filter.cc:290]          FLUSHER:queue batch to forward in closure:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978637033   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978668270   18857 channel_stack.cc:256]                 OP[deadline:0x7fafd8019f50]:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978699670   18857 channel_stack.cc:256]                 OP[server_call_tracer:0x7fafd8019f68]:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978725729   18857 promise_based_filter.cc:2035]         SVR[server_call_tracer:0x7fafd8019f68] StartBatch: have_promise=true recv_initial_state=COMPLETE send_trailing_state=INITIAL captured={} send_initial_metadata=GOT_PIPE
D1008 10:52:09.978756310   18857 promise_based_filter.cc:2169]         SVR[server_call_tracer:0x7fafd8019f68]ServerCallData::Completed: send_trailing_state=INITIAL send_initial_state=GOT_PIPE error=INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1008 10:52:09.978798194   18857 promise_based_filter.cc:290]          FLUSHER:queue batch to forward in closure:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978830533   18857 promise_based_filter.cc:302]          FLUSHER:forward batch:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978859226   18857 channel_stack.cc:256]                 OP[connected:0x7fafd8019f80]:  CANCEL:INTERNAL:Unable to parse request {grpc_status:13, grpc_message:"Unable to parse request"}
I1008 10:52:09.978961665   18857 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1008 10:52:09.978989291   18857 homa_stream.cc:664]                   Sending peer cancellation for RPC id 1
I1008 10:52:09.979100895   18857 homa_stream.cc:184]                   Sending Homa response to 192.168.93.130:32894, stream id 1, sequence 1 with homaId 12485083, 0 initial metadata bytes, 0 payload bytes, 0 trailing metadata bytes
I1008 10:52:09.979223542   18857 homa_stream.cc:636]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
mikehb commented 11 months ago

I found that payload_ is empty in grpc/include/grpcpp/server_interface.h,

    bool FinalizeResult(void** tag, bool* status) override {
      // If we are done intercepting, then there is nothing more for us to do
      if (done_intercepting_) {
        return RegisteredAsyncRequest::FinalizeResult(tag, status);
      }
      if (*status) {
        if (!payload_.Valid() || !SerializationTraits<Message>::Deserialize(
                                      payload_.bbuf_ptr(), request_)
                                      .ok()) {
          // If deserialization fails, we cancel the call and instantiate
          // a new instance of ourselves to request another call.  We then
          // return false, which prevents the call from being returned to
          // the application.
          grpc_call_cancel_with_status(call_, GRPC_STATUS_INTERNAL,
                                       "Unable to parse request", nullptr);
          grpc_call_unref(call_);
          new PayloadAsyncRequest(registered_method_, server_, context_,
                                  stream_, call_cq_, notification_cq_, tag_,
                                  request_);
          delete this;
          return false;
        }
      }
      // Set interception point for recv message
      interceptor_methods_.AddInterceptionHookPoint(
          experimental::InterceptionHookPoints::POST_RECV_MESSAGE);
      interceptor_methods_.SetRecvMessage(request_, nullptr);
      return RegisteredAsyncRequest::FinalizeResult(tag, status);
    }

FYI, I use grpc's asynchronous method, not sure whether grpc_homa supports it.

johnousterhout commented 11 months ago

Thanks for filing the issue. Asynchronous methods should theoretically work (I have tested simple examples). It looks like there are two problems: a crash and an empty request message. Do you by any chance have a simple program you could send me that makes both of these problems happen? If so, I should be able to track down the problems pretty quickly. If not, could you gather a bit more information, such as:

Thanks.

-John-

mikehb commented 11 months ago

”For the empty request message, could you enable INFO-level logging on the client (export GRPC_VERBOSITY=INFO) and send me the client-side log?”

D1010 13:29:32.081892000    2870 config.cc:161]                        gRPC EXPERIMENT tcp_frame_size_tuning               OFF (default:OFF)
D1010 13:29:32.082003984    2870 config.cc:161]                        gRPC EXPERIMENT tcp_rcv_lowat                       OFF (default:OFF)
D1010 13:29:32.082024217    2870 config.cc:161]                        gRPC EXPERIMENT peer_state_based_framing            OFF (default:OFF)
D1010 13:29:32.082039329    2870 config.cc:161]                        gRPC EXPERIMENT memory_pressure_controller          OFF (default:OFF)
D1010 13:29:32.082052886    2870 config.cc:161]                        gRPC EXPERIMENT unconstrained_max_quota_buffer_size OFF (default:OFF)
D1010 13:29:32.082066715    2870 config.cc:161]                        gRPC EXPERIMENT event_engine_client                 OFF (default:OFF)
D1010 13:29:32.082080546    2870 config.cc:161]                        gRPC EXPERIMENT monitoring_experiment               ON  (default:ON)
D1010 13:29:32.082129927    2870 config.cc:161]                        gRPC EXPERIMENT promise_based_client_call           OFF (default:OFF)
D1010 13:29:32.082181522    2870 config.cc:161]                        gRPC EXPERIMENT free_large_allocator                OFF (default:OFF)
D1010 13:29:32.082228178    2870 config.cc:161]                        gRPC EXPERIMENT promise_based_server_call           OFF (default:OFF)
D1010 13:29:32.082245315    2870 config.cc:161]                        gRPC EXPERIMENT transport_supplies_client_latency   OFF (default:OFF)
D1010 13:29:32.082258895    2870 config.cc:161]                        gRPC EXPERIMENT event_engine_listener               OFF (default:OFF)
D1010 13:29:32.082272391    2870 config.cc:161]                        gRPC EXPERIMENT schedule_cancellation_over_write    OFF (default:OFF)
D1010 13:29:32.082285375    2870 config.cc:161]                        gRPC EXPERIMENT trace_record_callops                OFF (default:OFF)
D1010 13:29:32.082298756    2870 config.cc:161]                        gRPC EXPERIMENT event_engine_dns                    OFF (default:OFF)
D1010 13:29:32.082311919    2870 config.cc:161]                        gRPC EXPERIMENT work_stealing                       OFF (default:OFF)
D1010 13:29:32.082324759    2870 config.cc:161]                        gRPC EXPERIMENT client_privacy                      OFF (default:OFF)
D1010 13:29:32.082375053    2870 config.cc:161]                        gRPC EXPERIMENT canary_client_privacy               OFF (default:OFF)
D1010 13:29:32.082398005    2870 config.cc:161]                        gRPC EXPERIMENT server_privacy                      OFF (default:OFF)
D1010 13:29:32.082439960    2870 config.cc:161]                        gRPC EXPERIMENT unique_metadata_strings             OFF (default:OFF)
D1010 13:29:32.082448320    2870 config.cc:161]                        gRPC EXPERIMENT keepalive_fix                       OFF (default:OFF)
I1010 13:29:32.083198293    2870 ev_epoll1_linux.cc:123]               grpc epoll fd: 7
D1010 13:29:32.083285750    2870 ev_posix.cc:113]                      Using polling engine: epoll1
I1010 13:29:32.132181005    2886 server_builder.cc:352]                Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
D1010 13:29:32.132434555    2886 lb_policy_registry.cc:47]             registering LB policy factory for "priority_experimental"
D1010 13:29:32.132474321    2886 lb_policy_registry.cc:47]             registering LB policy factory for "outlier_detection_experimental"
D1010 13:29:32.132495355    2886 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_target_experimental"
D1010 13:29:32.132524129    2886 lb_policy_registry.cc:47]             registering LB policy factory for "pick_first"
D1010 13:29:32.132549906    2886 lb_policy_registry.cc:47]             registering LB policy factory for "round_robin"
D1010 13:29:32.132567904    2886 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_round_robin"
D1010 13:29:32.132627628    2886 lb_policy_registry.cc:47]             registering LB policy factory for "grpclb"
D1010 13:29:32.132709415    2886 dns_resolver_plugin.cc:44]            Using ares dns resolver
D1010 13:29:32.132774257    2886 lb_policy_registry.cc:47]             registering LB policy factory for "rls_experimental"
D1010 13:29:32.132889358    2886 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_manager_experimental"
D1010 13:29:32.132918777    2886 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_impl_experimental"
D1010 13:29:32.132938168    2886 lb_policy_registry.cc:47]             registering LB policy factory for "cds_experimental"
D1010 13:29:32.132957003    2886 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_resolver_experimental"
D1010 13:29:32.132976186    2886 lb_policy_registry.cc:47]             registering LB policy factory for "xds_override_host_experimental"
D1010 13:29:32.132994668    2886 lb_policy_registry.cc:47]             registering LB policy factory for "xds_wrr_locality_experimental"
D1010 13:29:32.133016465    2886 lb_policy_registry.cc:47]             registering LB policy factory for "ring_hash_experimental"
D1010 13:29:32.133037143    2886 certificate_provider_registry.cc:33]  registering certificate provider factory for "file_watcher"
I1010 13:29:32.144951551    2886 ev_epoll1_linux.cc:360]               grpc epoll fd: 46
I1010 13:29:32.212336500    2870 socket_utils_common_posix.cc:366]     TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
I1010 13:29:32.213652471    2882 subchannel.cc:707]                    subchannel 0x4a61570 {address=ipv4:192.168.93.130:19527, args={grpc.client_channel_factory=0x4a70920, grpc.default_authority=192.168.93.130:19527, grpc.http2.max_pings_without_data=0, grpc.internal.channel_credentials=0x4a70900, grpc.internal.event_engine=0x4ad1090, grpc.internal.security_connector=0x4a8a770, grpc.internal.subchannel_pool=0x4a6d350, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=7000, grpc.keepalive_timeout_ms=2000, grpc.primary_user_agent=grpc-c++/1.57.0, grpc.resource_quota=0x7f82e8003c90, grpc.server_uri=dns:///192.168.93.130:19527}}: connect failed (UNKNOWN:Failed to connect to remote host: Connection refused {file:"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_client_posix.cc", file_line:248, created_time:"2023-10-10T13:29:32.212874336+08:00", errno:111, os_error:"Connection refused", syscall:"connect", target_address:"ipv4:192.168.93.130:19527"}), backing off for 998 ms
[2023/10/10-13:29:33.535 +08:00] [---E---] [2970] [/home/zhdh/vrs0930/codes/preprocess/code/CnrtkRgnNet.cpp:928] Push GNSS = 0, In_Star = 36, Greater!
I1010 13:30:03.073911266    2925 homa_incoming.cc:232]                 Received Homa message from 192.168.93.130:32772, stream id 1, sequence 1 with homaId 3, 67 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0xf, bpage[0] 0
D1010 13:30:03.074122833    2925 homa_stream.cc:617]                   transferData
I1010 13:30:03.074240120    2925 util.cc:27]                           Incoming initial metadata: :path -> /rtk.Rtk/WatchEph
I1010 13:30:03.074263162    2925 util.cc:27]                           Incoming initial metadata: :authority -> 192.168.93.130:1027
I1010 13:30:03.074386297    2925 homa_stream.cc:523]                   Invoked message closure for stream id 1
I1010 13:30:03.074448025    2925 homa_listener.cc:399]                 HomaListener::Transport::set_pollset invoked
I1010 13:30:03.074785037    2921 homa_incoming.cc:232]                 Received Homa message from 192.168.93.130:32772, stream id 2, sequence 1 with homaId 5, 67 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0xf, bpage[0] 0
D1010 13:30:03.074907741    2921 homa_stream.cc:617]                   transferData
I1010 13:30:03.075059133    2921 util.cc:27]                           Incoming initial metadata: :path -> /rtk.Rtk/WatchObs
I1010 13:30:03.075154119    2921 util.cc:27]                           Incoming initial metadata: :authority -> 192.168.93.130:1027
I1010 13:30:03.075324361    2921 homa_stream.cc:523]                   Invoked message closure for stream id 2
I1010 13:30:03.075387587    2921 homa_listener.cc:399]                 HomaListener::Transport::set_pollset invoked
I1010 13:30:03.075048796    2886 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.075710678    2886 homa_stream.cc:665]                   Sending peer cancellation for RPC id 1
I1010 13:30:03.075758334    2886 homa_stream.cc:184]                   Sending Homa response to 192.168.93.130:32772, stream id 1, sequence 1 with homaId 3, 0 initial metadata bytes, 0 payload bytes, 0 trailing metadata bytes
I1010 13:30:03.075811415    2918 homa_incoming.cc:232]                 Received Homa message from 192.168.93.130:32772, stream id 3, sequence 1 with homaId 7, 80 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0xf, bpage[0] 0
D1010 13:30:03.075939150    2918 homa_stream.cc:617]                   transferData
I1010 13:30:03.076044027    2918 util.cc:27]                           Incoming initial metadata: :path -> /rtk.Rtk/WatchPreprocessStaids
I1010 13:30:03.076062084    2886 homa_stream.cc:637]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.076070064    2918 util.cc:27]                           Incoming initial metadata: :authority -> 192.168.93.130:1027
I1010 13:30:03.076173701    2886 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.076204670    2886 homa_stream.cc:637]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.076232857    2886 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.076254788    2918 homa_stream.cc:523]                   Invoked message closure for stream id 3
I1010 13:30:03.076255213    2886 homa_stream.cc:637]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
I1010 13:30:03.076317057    2918 homa_listener.cc:399]                 HomaListener::Transport::set_pollset invoked
I1010 13:30:03.076534396    2886 homa_listener.cc:522]                 HomaStream destroyed

”Do you by any chance have a simple program you could send me that makes both of these problems happen?” Let me have a try and get back

johnousterhout commented 11 months ago

This looks like the server's trace, not a client-side trace (the first Homa record is for an incoming message; I'd expect to see the first records relating to an outgoing message on the client side). Can you send the client-side trace?

Thanks.

-John-

mikehb commented 11 months ago

homa.tar.gz This simple test program could be used to reproduce the issue. I use bazel to build this program and grpc_homa.

Homa module has been installed,

[mikehuang@localhost homa]$ lsmod | grep homa
homa                  323584  0
[mikehuang@localhost homa]$
  1. TCP works server output (squeeze some grpc logs to make this post short)

    [mikehuang@localhost homa]$ ./bazel-bin/server
    0
    1
    2
    3
    4
    register text message client 127.0.0.1:45942
    5
    6
    7
    8

    client,

    [mikehuang@localhost homa]$ ./bazel-bin/client
    5
    6
    7
    8
  2. HOMA doesn't work server,

    [mikehuang@localhost homa]$ ./bazel-bin/server -homa
    0
    D1011 11:47:42.772919462   33469 config.cc:161]                        gRPC EXPERIMENT tcp_frame_size_tuning               OFF (default:OFF)
    D1011 11:47:42.773185801   33469 config.cc:161]                        gRPC EXPERIMENT tcp_rcv_lowat                       OFF (default:OFF)
    D1011 11:47:42.773245158   33469 config.cc:161]                        gRPC EXPERIMENT peer_state_based_framing            OFF (default:OFF)
    D1011 11:47:42.773267381   33469 config.cc:161]                        gRPC EXPERIMENT memory_pressure_controller          OFF (default:OFF)
    D1011 11:47:42.773280694   33469 config.cc:161]                        gRPC EXPERIMENT unconstrained_max_quota_buffer_size OFF (default:OFF)
    D1011 11:47:42.773330717   33469 config.cc:161]                        gRPC EXPERIMENT event_engine_client                 OFF (default:OFF)
    D1011 11:47:42.773352047   33469 config.cc:161]                        gRPC EXPERIMENT monitoring_experiment               ON  (default:ON)
    D1011 11:47:42.773364865   33469 config.cc:161]                        gRPC EXPERIMENT promise_based_client_call           OFF (default:OFF)
    D1011 11:47:42.773406600   33469 config.cc:161]                        gRPC EXPERIMENT free_large_allocator                OFF (default:OFF)
    D1011 11:47:42.773427485   33469 config.cc:161]                        gRPC EXPERIMENT promise_based_server_call           OFF (default:OFF)
    D1011 11:47:42.773439727   33469 config.cc:161]                        gRPC EXPERIMENT transport_supplies_client_latency   OFF (default:OFF)
    D1011 11:47:42.773513389   33469 config.cc:161]                        gRPC EXPERIMENT event_engine_listener               OFF (default:OFF)
    D1011 11:47:42.773538963   33469 config.cc:161]                        gRPC EXPERIMENT schedule_cancellation_over_write    OFF (default:OFF)
    D1011 11:47:42.773585663   33469 config.cc:161]                        gRPC EXPERIMENT trace_record_callops                OFF (default:OFF)
    D1011 11:47:42.773601667   33469 config.cc:161]                        gRPC EXPERIMENT event_engine_dns                    OFF (default:OFF)
    D1011 11:47:42.773640797   33469 config.cc:161]                        gRPC EXPERIMENT work_stealing                       OFF (default:OFF)
    D1011 11:47:42.773655509   33469 config.cc:161]                        gRPC EXPERIMENT client_privacy                      OFF (default:OFF)
    D1011 11:47:42.773673532   33469 config.cc:161]                        gRPC EXPERIMENT canary_client_privacy               OFF (default:OFF)
    D1011 11:47:42.773686593   33469 config.cc:161]                        gRPC EXPERIMENT server_privacy                      OFF (default:OFF)
    D1011 11:47:42.773703487   33469 config.cc:161]                        gRPC EXPERIMENT unique_metadata_strings             OFF (default:OFF)
    D1011 11:47:42.773715462   33469 config.cc:161]                        gRPC EXPERIMENT keepalive_fix                       OFF (default:OFF)
    I1011 11:47:42.774306511   33469 ev_epoll1_linux.cc:123]               grpc epoll fd: 3
    D1011 11:47:42.774488760   33469 ev_posix.cc:113]                      Using polling engine: epoll1
    I1011 11:47:42.774875801   33469 server_builder.cc:352]                Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
    D1011 11:47:42.775084866   33469 lb_policy_registry.cc:47]             registering LB policy factory for "priority_experimental"
    D1011 11:47:42.775173335   33469 lb_policy_registry.cc:47]             registering LB policy factory for "outlier_detection_experimental"
    D1011 11:47:42.775211556   33469 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_target_experimental"
    D1011 11:47:42.775276039   33469 lb_policy_registry.cc:47]             registering LB policy factory for "pick_first"
    D1011 11:47:42.775326172   33469 lb_policy_registry.cc:47]             registering LB policy factory for "round_robin"
    D1011 11:47:42.775353422   33469 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_round_robin"
    D1011 11:47:42.775437778   33469 lb_policy_registry.cc:47]             registering LB policy factory for "grpclb"
    D1011 11:47:42.775550675   33469 dns_resolver_plugin.cc:44]            Using ares dns resolver
    D1011 11:47:42.775623447   33469 lb_policy_registry.cc:47]             registering LB policy factory for "rls_experimental"
    D1011 11:47:42.775717755   33469 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_manager_experimental"
    D1011 11:47:42.775749598   33469 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_impl_experimental"
    D1011 11:47:42.775836163   33469 lb_policy_registry.cc:47]             registering LB policy factory for "cds_experimental"
    D1011 11:47:42.775863137   33469 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_resolver_experimental"
    D1011 11:47:42.776058991   33469 lb_policy_registry.cc:47]             registering LB policy factory for "xds_override_host_experimental"
    D1011 11:47:42.776152528   33469 lb_policy_registry.cc:47]             registering LB policy factory for "xds_wrr_locality_experimental"
    D1011 11:47:42.776192309   33469 lb_policy_registry.cc:47]             registering LB policy factory for "ring_hash_experimental"
    D1011 11:47:42.776217809   33469 certificate_provider_registry.cc:33]  registering certificate provider factory for "file_watcher"
    I1011 11:47:42.777811700   33469 ev_epoll1_linux.cc:360]               grpc epoll fd: 5
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    I1011 11:47:58.018486070   33469 homa_incoming.cc:232]                 Received Homa message from 127.0.0.1:32791, stream id 1, sequence 1 with homaId 73, 70 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0xf, bpage[0] 0
    I1011 11:47:58.018689626   33469 util.cc:27]                           Incoming initial metadata: :path -> /rtk.Rtk/WatchTextMessage
    I1011 11:47:58.018716740   33469 util.cc:27]                           Incoming initial metadata: :authority -> 127.0.0.1:4000
    I1011 11:47:58.018986465   33469 homa_stream.cc:523]                   Invoked message closure for stream id 1
    I1011 11:47:58.019146051   33469 homa_listener.cc:399]                 HomaListener::Transport::set_pollset invoked
    I1011 11:47:58.019375904   33469 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019455535   33469 homa_stream.cc:664]                   Sending peer cancellation for RPC id 1
    I1011 11:47:58.019478087   33469 homa_stream.cc:184]                   Sending Homa response to 127.0.0.1:32791, stream id 1, sequence 1 with homaId 73, 0 initial metadata bytes, 0 payload bytes, 0 trailing metadata bytes
    I1011 11:47:58.019637731   33469 homa_stream.cc:636]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019727458   33469 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019755526   33469 homa_stream.cc:636]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019783475   33469 homa_listener.cc:427]                 Cancelling stream: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019795219   33469 homa_stream.cc:636]                   Recording error for stream id 1: INTERNAL: Unable to parse request [type.googleapis.com/grpc.status.int.grpc_status='13'] [type.googleapis.com/grpc.status.str.grpc_message='Unable to parse request']
    I1011 11:47:58.019842043   33469 homa_listener.cc:522]                 HomaStream destroyed
    16
    17
    18
    19
    20
    21

    client,

    [mikehuang@localhost homa]$ ./bazel-bin/client -homa
    D1011 11:47:57.984721036   33492 config.cc:161]                        gRPC EXPERIMENT tcp_frame_size_tuning               OFF (default:OFF)
    D1011 11:47:57.984923751   33492 config.cc:161]                        gRPC EXPERIMENT tcp_rcv_lowat                       OFF (default:OFF)
    D1011 11:47:57.984945697   33492 config.cc:161]                        gRPC EXPERIMENT peer_state_based_framing            OFF (default:OFF)
    D1011 11:47:57.984959872   33492 config.cc:161]                        gRPC EXPERIMENT memory_pressure_controller          OFF (default:OFF)
    D1011 11:47:57.984972989   33492 config.cc:161]                        gRPC EXPERIMENT unconstrained_max_quota_buffer_size OFF (default:OFF)
    D1011 11:47:57.984986144   33492 config.cc:161]                        gRPC EXPERIMENT event_engine_client                 OFF (default:OFF)
    D1011 11:47:57.984998779   33492 config.cc:161]                        gRPC EXPERIMENT monitoring_experiment               ON  (default:ON)
    D1011 11:47:57.985011307   33492 config.cc:161]                        gRPC EXPERIMENT promise_based_client_call           OFF (default:OFF)
    D1011 11:47:57.985024080   33492 config.cc:161]                        gRPC EXPERIMENT free_large_allocator                OFF (default:OFF)
    D1011 11:47:57.985036188   33492 config.cc:161]                        gRPC EXPERIMENT promise_based_server_call           OFF (default:OFF)
    D1011 11:47:57.985048639   33492 config.cc:161]                        gRPC EXPERIMENT transport_supplies_client_latency   OFF (default:OFF)
    D1011 11:47:57.985061652   33492 config.cc:161]                        gRPC EXPERIMENT event_engine_listener               OFF (default:OFF)
    D1011 11:47:57.985073873   33492 config.cc:161]                        gRPC EXPERIMENT schedule_cancellation_over_write    OFF (default:OFF)
    D1011 11:47:57.985086531   33492 config.cc:161]                        gRPC EXPERIMENT trace_record_callops                OFF (default:OFF)
    D1011 11:47:57.985100452   33492 config.cc:161]                        gRPC EXPERIMENT event_engine_dns                    OFF (default:OFF)
    D1011 11:47:57.985113684   33492 config.cc:161]                        gRPC EXPERIMENT work_stealing                       OFF (default:OFF)
    D1011 11:47:57.985127835   33492 config.cc:161]                        gRPC EXPERIMENT client_privacy                      OFF (default:OFF)
    D1011 11:47:57.985142018   33492 config.cc:161]                        gRPC EXPERIMENT canary_client_privacy               OFF (default:OFF)
    D1011 11:47:57.985209551   33492 config.cc:161]                        gRPC EXPERIMENT server_privacy                      OFF (default:OFF)
    D1011 11:47:57.985259070   33492 config.cc:161]                        gRPC EXPERIMENT unique_metadata_strings             OFF (default:OFF)
    D1011 11:47:57.985276092   33492 config.cc:161]                        gRPC EXPERIMENT keepalive_fix                       OFF (default:OFF)
    I1011 11:47:57.986043184   33492 ev_epoll1_linux.cc:123]               grpc epoll fd: 3
    D1011 11:47:57.986142173   33492 ev_posix.cc:113]                      Using polling engine: epoll1
    D1011 11:47:57.986573627   33492 lb_policy_registry.cc:47]             registering LB policy factory for "priority_experimental"
    D1011 11:47:57.986690749   33492 lb_policy_registry.cc:47]             registering LB policy factory for "outlier_detection_experimental"
    D1011 11:47:57.986768079   33492 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_target_experimental"
    D1011 11:47:57.986861472   33492 lb_policy_registry.cc:47]             registering LB policy factory for "pick_first"
    D1011 11:47:57.986919100   33492 lb_policy_registry.cc:47]             registering LB policy factory for "round_robin"
    D1011 11:47:57.986950209   33492 lb_policy_registry.cc:47]             registering LB policy factory for "weighted_round_robin"
    D1011 11:47:57.987037710   33492 lb_policy_registry.cc:47]             registering LB policy factory for "grpclb"
    D1011 11:47:57.987126926   33492 dns_resolver_plugin.cc:44]            Using ares dns resolver
    D1011 11:47:57.987232894   33492 lb_policy_registry.cc:47]             registering LB policy factory for "rls_experimental"
    D1011 11:47:57.987349630   33492 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_manager_experimental"
    D1011 11:47:57.987415680   33492 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_impl_experimental"
    D1011 11:47:57.987427682   33492 lb_policy_registry.cc:47]             registering LB policy factory for "cds_experimental"
    D1011 11:47:57.987436892   33492 lb_policy_registry.cc:47]             registering LB policy factory for "xds_cluster_resolver_experimental"
    D1011 11:47:57.987447805   33492 lb_policy_registry.cc:47]             registering LB policy factory for "xds_override_host_experimental"
    D1011 11:47:57.987460400   33492 lb_policy_registry.cc:47]             registering LB policy factory for "xds_wrr_locality_experimental"
    D1011 11:47:57.987470804   33492 lb_policy_registry.cc:47]             registering LB policy factory for "ring_hash_experimental"
    D1011 11:47:57.987551071   33492 certificate_provider_registry.cc:33]  registering certificate provider factory for "file_watcher"
    I1011 11:47:57.989648767   33492 ev_epoll1_linux.cc:360]               grpc epoll fd: 5
    I1011 11:47:58.017386734   33492 homa_client.cc:299]                   HomaClient::perform_op invoked with start_connectivity_watch
    I1011 11:47:58.017900351   33492 homa_stream.cc:301]                   Outgoing metadata: key :path, value /rtk.Rtk/WatchTextMessage
    I1011 11:47:58.018137501   33492 homa_stream.cc:301]                   Outgoing metadata: key :authority, value 127.0.0.1:4000
    I1011 11:47:58.018323653   33492 homa_stream.cc:174]                   Sent Homa request to 127.0.0.1:4000, stream id 1, sequence 1 with homaId 72, 70 initial metadata bytes, 0 payload bytes, 0 trailing metadata bytes
    I1011 11:47:58.019903824   33513 homa_incoming.cc:232]                 Received Homa message from 127.0.0.1:4000, stream id 1, sequence 1 with homaId 72, 0 initMd bytes, 0 message bytes, 0 trailMd bytes, flags 0x20, bpage[0] 0
    I1011 11:47:58.019969166   33513 homa_stream.cc:592]                   RPC id 1 cancelled by peer
    I1011 11:47:58.019986375   33513 homa_stream.cc:636]                   Recording error for stream id 1: CANCELLED:
    I1011 11:47:58.020078768   33513 homa_stream.cc:636]                   Recording error for stream id 1: CANCELLED:
    I1011 11:47:58.020121701   33513 homa_stream.cc:636]                   Recording error for stream id 1: CANCELLED:
    I1011 11:47:58.020223579   33513 homa_stream.cc:636]                   Recording error for stream id 1: CANCELLED:
    ^C
johnousterhout commented 11 months ago

Thanks for the test case; that was super-helpful. I have found and fixed a bug (grpc_homa wasn't properly handling zero-length messages on the receiver) and your test case now works for me. Can you pull the lastest sources and see if it also works for you?

-John-

mikehb commented 11 months ago

It works for me. Thank you John