envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.79k stars 4.76k forks source link

wasm: response body buffering with http2 #36227

Open dandlake opened 1 week ago

dandlake commented 1 week ago

Hi, I am facing some issue in response body buffering when the upstream is in http2.

In my test case, the whole response body is 25246B. In the log, looks like it's processed in 4 chunks: CHK1(8192B), CHK2(8192B), CHK3(8192B) and CHK4(670B). In my wasm filter callback, I return Action::Pause for all of them unless end_of_stream is true, so that I can accumulate the whole response body to process. The problem I am seeing is that, the 4 callbacks(with size, end_of_stream) are (8192, 0), (8192, 0), (16384, 0), (24576, 1). With these callbacks, at the last one, I only get 24576B body and missing the last 670B.

I traced into the code a bit to see how buffer accumulation happens(I am looking at the code in 1.26.8. The same behavior happens with latest 1.32 dev code too). In filter_manager.cc:FilterManager::encodeData(), looks like it calls the filter first, then if status is StopIterationAndBuffer it calls commonHandleAfterDataCallback() and commonHandleBufferData() to accumulate the buffer in FilterManager::buffered_requestdata.

FilterDataStatus status = (*entry)->handle_->encodeData(data, (*entry)->end_stream_);
...
(*entry)->commonHandleAfterDataCallback(status, data, state_.encoder_filters_streaming_)
    calls (provcommonHandleBufferDataided_data);

This explains the above response body callback behavior, because buffer accumulation happens AFTER the response body callback.

My question is: what's the design here? Should the 4th callback come with end_of_stream false and have a 5th callback with end_of_stream true(after the last 670B chunk is accumulated)?

I tried to make the upstream http1 and the response body callbacks are good and I can receive the whole response body. In this http1 case, I do see an extra callback with end_of_stream true.

Here are some logs that I captured:

# grep "codec_impl\|codec_client\|proxy_on" http2.log
...
[2024-09-19 16:53:46.023][93903][trace][http2] [source/common/http/http2/codec_impl.cc:923] [C1] dispatching 25295 bytes
[2024-09-19 16:53:46.023][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1030] [C1] about to recv frame type=0, flags=0, stream_id=1
[2024-09-19 16:53:46.023][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1056] [C1] recv frame type=0
[2024-09-19 16:53:46.023][93903][trace][http2] [source/common/http/http2/codec_impl.cc:2013] [C1] track inbound frame type=0 flags=0 length=8192 padding_length=0
[2024-09-19 16:53:46.024][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 8192, 0)
[2024-09-19 16:53:46.026][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1030] [C1] about to recv frame type=0, flags=0, stream_id=1
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1056] [C1] recv frame type=0
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:2013] [C1] track inbound frame type=0 flags=0 length=8192 padding_length=0
[2024-09-19 16:53:46.026][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 8192, 0)
[2024-09-19 16:53:46.026][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1030] [C1] about to recv frame type=0, flags=0, stream_id=1
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1056] [C1] recv frame type=0
[2024-09-19 16:53:46.026][93903][trace][http2] [source/common/http/http2/codec_impl.cc:2013] [C1] track inbound frame type=0 flags=0 length=8192 padding_length=0
[2024-09-19 16:53:46.026][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 16384, 0)
[2024-09-19 16:53:46.027][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:53:46.027][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1030] [C1] about to recv frame type=0, flags=1, stream_id=1
[2024-09-19 16:53:46.027][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1056] [C1] recv frame type=0
[2024-09-19 16:53:46.027][93903][trace][http2] [source/common/http/http2/codec_impl.cc:2013] [C1] track inbound frame type=0 flags=1 length=670 padding_length=0
[2024-09-19 16:53:46.027][93903][debug][client] [source/common/http/codec_client.cc:126] [C1] response complete
[2024-09-19 16:53:46.028][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 24576, 1)
[2024-09-19 16:53:46.034][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1786] Codec does not have Metadata frame support.
[2024-09-19 16:53:46.034][93903][debug][http2] [source/common/http/http2/codec_impl.cc:1586] [C3] updating connection-level initial window size to 268435456
[2024-09-19 16:53:46.034][93903][debug][client] [source/common/http/codec_client.cc:57] [C3] connecting
[2024-09-19 16:53:46.034][93903][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:53:46.034][93903][debug][http2] [source/common/http/http2/codec_impl.cc:1331] [C1] stream 1 closed: 0
[2024-09-19 16:53:46.034][93903][debug][http2] [source/common/http/http2/codec_impl.cc:1395] [C1] Recouping 0 bytes of flow control window for stream 1.
[2024-09-19 16:53:46.034][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1030] [C1] about to recv frame type=3, flags=0, stream_id=1
[2024-09-19 16:53:46.034][93903][trace][http2] [source/common/http/http2/codec_impl.cc:2013] [C1] track inbound frame type=3 flags=0 length=4 padding_length=0
[2024-09-19 16:53:46.034][93903][trace][http2] [source/common/http/http2/codec_impl.cc:1056] [C1] recv frame type=3
[2024-09-19 16:53:46.034][93903][trace][http2] [source/common/http/http2/codec_impl.cc:958] [C1] dispatched 25295 bytes

As a comparison, the log in the good http1 case:

# grep "codec_impl\|codec_client\|proxy_on" http1.log
[2024-09-19 16:32:18.847][93849][debug][client] [source/common/http/codec_client.cc:88] [C1] connected
[2024-09-19 16:32:18.847][93849][debug][client] [source/common/http/codec_client.cc:139] [C1] encode complete
[2024-09-19 16:32:18.850][93849][trace][http] [source/common/http/http1/codec_impl.cc:639] [C1] parsing 15928 bytes
[2024-09-19 16:32:18.850][93849][trace][http] [source/common/http/http1/codec_impl.cc:581] [C1] message begin
[2024-09-19 16:32:18.850][93849][trace][http] [source/common/http/http1/codec_impl.cc:538] [C1] completed header: key=Server value=nginx/1.18.0 (Ubuntu)
...
[2024-09-19 16:32:18.851][93849][trace][http] [source/common/http/http1/codec_impl.cc:832] [C1] onHeadersCompleteBase
...
[2024-09-19 16:32:18.851][93849][trace][http] [source/common/http/http1/codec_impl.cc:1437] [C1] Client: onHeadersComplete size=8
...
[2024-09-19 16:32:18.853][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_headers(2, 10, 0)
[2024-09-19 16:32:18.856][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_headers return: 0
...
[2024-09-19 16:32:18.857][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 15663, 0)
[2024-09-19 16:32:18.860][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:32:18.861][93849][trace][http] [source/common/http/http1/codec_impl.cc:689] [C1] parsed 15928 bytes
[2024-09-19 16:32:18.861][93849][trace][http] [source/common/http/http1/codec_impl.cc:639] [C1] parsing 9583 bytes
[2024-09-19 16:32:18.861][93849][trace][http] [source/common/http/http1/codec_impl.cc:936] [C1] message complete
[2024-09-19 16:32:18.861][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 15663, 0)
[2024-09-19 16:32:18.862][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
[2024-09-19 16:32:18.862][93849][trace][http] [source/common/http/http1/codec_impl.cc:1509] [C1] message complete
[2024-09-19 16:32:18.862][93849][debug][client] [source/common/http/codec_client.cc:126] [C1] response complete
[2024-09-19 16:32:18.863][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host->vm] proxy_on_response_body(2, 25246, 1)
[2024-09-19 16:32:18.872][93849][trace][http2] [source/common/http/http2/codec_impl.cc:1786] Codec does not have Metadata frame support.
[2024-09-19 16:32:18.872][93849][debug][http2] [source/common/http/http2/codec_impl.cc:1586] [C3] updating connection-level initial window size to 268435456
[2024-09-19 16:32:18.873][93849][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39] [host<-vm] proxy_on_response_body return: 1
zuercher commented 1 week ago

You certainly shouldn't see the end-of-stream flag until all the data has been delivered. I'm skeptical that this is a bug in the filter manager, since this would break many Envoy filters.

cc @mpwarres for a look from the wasm side

dandlake commented 1 week ago

Hi, @zuercher , thanks for the opinion. I am not sure if the problem is on the wasm side. Below is the backtrace at wasm OnResponseBody() for the last 670B chunk. In frame 2, before calling into wasm, the "data" is the last 670B chunk and end_of_stream is true. However, at this point, the FilterManager::buffered_requestdata(where response body is accumulated) only has the first 3 chunks(3*8192=24576B), because the accumulation happens in commonHandleAfterDataCallback () AFTER the wasm filter. To deliver all the data to wasm filter, either the accumulation happens BEFORE wasm filter, OR there is another call afterwards(like the http1 case). Is this understanding right?

#0  proxy_wasm::ContextBase::onResponseBody (this=0x7917cf48010, body_length=24576, end_of_stream=true) at external/proxy_wasm_cpp_host/src/context.cc:375
#1  0x000055555a01560a in Envoy::Extensions::Common::Wasm::Context::encodeData (this=0x7917cf48010, data=..., end_stream=true) at source/extensions/common/wasm/context.cc:1750
#2  0x000055555ecfd30d in Envoy::Http::FilterManager::encodeData (this=0x7917f3998a8, filter=0x0, data=..., end_stream=true, 
    filter_iteration_start_state=Envoy::Http::FilterManager::FilterIterationStartState::CanStartFromCurrent) at source/common/http/filter_manager.cc:1329
#3  0x000055555ecfcb4a in Envoy::Http::ActiveStreamDecoderFilter::encodeData (this=0x7917f994370, data=..., end_stream=true) at source/common/http/filter_manager.cc:463
#4  0x000055555eb7d242 in Envoy::Router::Filter::onUpstreamData (this=0x7917e6d3690, data=..., upstream_request=..., end_stream=true) at source/common/router/router.cc:1605
#5  0x000055555eba1080 in Envoy::Router::UpstreamRequest::decodeData (this=0x7917dc00a00, data=..., end_stream=true) at source/common/router/upstream_request.cc:336
#6  0x000055555ebb6042 in Envoy::Router::UpstreamRequestFilterManagerCallbacks::encodeData (this=0x7917cef5500, data=..., end_stream=true)
    at ./source/common/router/upstream_request.h:304
#7  0x000055555ecfe47e in Envoy::Http::FilterManager::encodeData (this=0x7917cfecc80, filter=0x0, data=..., end_stream=true, 
    filter_iteration_start_state=Envoy::Http::FilterManager::FilterIterationStartState::CanStartFromCurrent) at source/common/http/filter_manager.cc:1356
#8  0x000055555ecfcb4a in Envoy::Http::ActiveStreamDecoderFilter::encodeData (this=0x7917f995760, data=..., end_stream=true) at source/common/http/filter_manager.cc:463
#9  0x000055555ebbfc2e in Envoy::Router::UpstreamCodecFilter::CodecBridge::decodeData (this=0x7917ce85050, data=..., end_stream=true) at source/common/router/upstream_codec_filter.cc:162
#10 0x000055555e39a6ac in Envoy::Http::ResponseDecoderWrapper::decodeData (this=0x7917ce85fa0, data=..., end_stream=true) at ./source/common/http/codec_wrappers.h:35
#11 0x000055555e927a61 in Envoy::Http::Http2::ConnectionImpl::StreamImpl::decodeData (this=0x7917f2e5680) at source/common/http/http2/codec_impl.cc:506
#12 0x000055555e93e717 in Envoy::Http::Http2::ConnectionImpl::onFrameReceived (this=0x7917e2eea88, frame=0x7917f1e8140) at source/common/http/http2/codec_impl.cc:1156

Another observation is that, in my http2 test, there is no trailer in the http2 response. There is indeed a proper content-length header. Not sure if this has any implication on the behavior.

dandlake commented 5 days ago

Thanks for assigning the issue, @jmarantz. Just some more findings from my side. I looked more into the working http1 case, and I found there is indeed a trigger with zero length buffer in http1 code in source/common/http/http1/codec_impl.cc:

CallbackResult ClientConnectionImpl::onMessageCompleteBase() {
...
    } else {
      Buffer::OwnedImpl buffer;
      response.decoder_->decodeData(buffer, true);    <<<<< zero length buffer with end_of_stream true
    }

The trace at this call is:

Thread 24 "wrk:worker_0" hit Breakpoint 1, proxy_wasm::ContextBase::onResponseBody (this=0x7917cb1b210, body_length=25246, end_of_stream=true) at external/proxy_wasm_cpp_host/src/context.cc:375
375 external/proxy_wasm_cpp_host/src/context.cc: No such file or directory.
(gdb) bt
#0  proxy_wasm::ContextBase::onResponseBody (this=0x7917cb1b210, body_length=25246, end_of_stream=true) at external/proxy_wasm_cpp_host/src/context.cc:375
#1  0x000055555a01560a in Envoy::Extensions::Common::Wasm::Context::encodeData (this=0x7917cb1b210, data=..., end_stream=true) at source/extensions/common/wasm/context.cc:1750
#2  0x000055555ecfd30d in Envoy::Http::FilterManager::encodeData (this=0x7917f38b8a8, filter=0x0, data=..., end_stream=true, 
    filter_iteration_start_state=Envoy::Http::FilterManager::FilterIterationStartState::CanStartFromCurrent) at source/common/http/filter_manager.cc:1329
#3  0x000055555ecfcb4a in Envoy::Http::ActiveStreamDecoderFilter::encodeData (this=0x7917f99a790, data=..., end_stream=true) at source/common/http/filter_manager.cc:463
#4  0x000055555eb7d242 in Envoy::Router::Filter::onUpstreamData (this=0x7917f3ff690, data=..., upstream_request=..., end_stream=true) at source/common/router/router.cc:1605
#5  0x000055555eba1080 in Envoy::Router::UpstreamRequest::decodeData (this=0x7917e606a00, data=..., end_stream=true) at source/common/router/upstream_request.cc:336
#6  0x000055555ebb6042 in Envoy::Router::UpstreamRequestFilterManagerCallbacks::encodeData (this=0x7917d179200, data=..., end_stream=true)
    at ./source/common/router/upstream_request.h:304
#7  0x000055555ecfe47e in Envoy::Http::FilterManager::encodeData (this=0x7917f313900, filter=0x0, data=..., end_stream=true, 
    filter_iteration_start_state=Envoy::Http::FilterManager::FilterIterationStartState::CanStartFromCurrent) at source/common/http/filter_manager.cc:1356
#8  0x000055555ecfcb4a in Envoy::Http::ActiveStreamDecoderFilter::encodeData (this=0x7917f99a840, data=..., end_stream=true) at source/common/http/filter_manager.cc:463
#9  0x000055555ebbfc2e in Envoy::Router::UpstreamCodecFilter::CodecBridge::decodeData (this=0x7917c6733b0, data=..., end_stream=true) at source/common/router/upstream_codec_filter.cc:162
#10 0x000055555e39a6ac in Envoy::Http::ResponseDecoderWrapper::decodeData (this=0x7917d178dd0, data=..., end_stream=true) at ./source/common/http/codec_wrappers.h:35
#11 0x000055555e39a6ac in Envoy::Http::ResponseDecoderWrapper::decodeData (this=0x7917c672980, data=..., end_stream=true) at ./source/common/http/codec_wrappers.h:35
#12 0x000055555e8e79c8 in Envoy::Http::Http1::ClientConnectionImpl::onMessageCompleteBase (this=0x7917e6c6000) at source/common/http/http1/codec_impl.cc:1531
#13 0x000055555e8d9cd5 in Envoy::Http::Http1::ConnectionImpl::onMessageCompleteImpl (this=0x7917e6c6008) at source/common/http/http1/codec_impl.cc:954
#14 0x000055555e8d8a81 in Envoy::Http::Http1::ConnectionImpl::onMessageComplete (this=0x7917e6c6008) at source/common/http/http1/codec_impl.cc:762
#15 0x000055555e91ccc6 in Envoy::Http::Http1::LegacyHttpParserImpl::Impl::Impl(http_parser_type, void*)::{lambda(http_parser*)#3}::operator()(http_parser*) const (this=0x7917d137f80, 
    parser=0x7917d137f80) at source/common/http/http1/legacy_parser_impl.cc:68
#16 0x000055555e91cc95 in Envoy::Http::Http1::LegacyHttpParserImpl::Impl::Impl(http_parser_type, void*)::{lambda(http_parser*)#3}::__invoke(http_parser*) (parser=0x7917d137f80)
    at source/common/http/http1/legacy_parser_impl.cc:66
#17 0x000055555f9ef62b in http_parser_execute (parser=0x7917d137f80, settings=0x7917d137fa0, 
    data=0x7917cb80000 "707,\n          -0.018566178157925606,\n          -0.035530343651771545,\n          0.0047495346516370773,\n          0.045272186398506165,\n          -0.042669080197811127,\n          0.022309053689241409,"..., len=8135) at external/envoy/bazel/external/http_parser/http_parser.c:1991
#18 0x000055555e91c13b in Envoy::Http::Http1::LegacyHttpParserImpl::Impl::execute (this=0x7917d137f80, 
    slice=0x7917cb80000 "707,\n          -0.018566178157925606,\n          -0.035530343651771545,\n          0.0047495346516370773,\n          0.045272186398506165,\n          -0.042669080197811127,\n          0.022309053689241409,"..., len=8135) at source/common/http/http1/legacy_parser_impl.cc:84
#19 0x000055555e91bbbf in Envoy::Http::Http1::LegacyHttpParserImpl::execute (this=0x7917e74bc20, 
    slice=0x7917cb80000 "707,\n          -0.018566178157925606,\n          -0.035530343651771545,\n          0.0047495346516370773,\n          0.045272186398506165,\n          -0.042669080197811127,\n          0.022309053689241409,"..., len=8135) at source/common/http/http1/legacy_parser_impl.cc:142
#20 0x000055555e8d2d63 in Envoy::Http::Http1::ConnectionImpl::dispatchSlice (this=0x7917e6c6008, 
    slice=0x7917cb80000 "707,\n          -0.018566178157925606,\n          -0.035530343651771545,\n          0.0047495346516370773,\n          0.045272186398506165,\n          -0.042669080197811127,\n          0.022309053689241409,"..., len=8135) at source/common/http/http1/codec_impl.cc:699
#21 0x000055555e8d0d1c in Envoy::Http::Http1::ConnectionImpl::dispatch (this=0x7917e6c6008, data=...) at source/common/http/http1/codec_impl.cc:662
#22 0x000055555e8cf348 in Envoy::Http::Http1::ClientConnectionImpl::dispatch (this=0x7917e6c6000, data=...) at source/common/http/http1/codec_impl.cc:627
#23 0x000055555e536669 in Envoy::Http::CodecClient::onData (this=0x7917d0c3830, data=...) at source/common/http/codec_client.cc:165
#24 0x000055555e53d2cc in Envoy::Http::CodecClient::CodecReadFilter::onData (this=0x7917e74bef0, data=..., end_stream=false) at ./source/common/http/codec_client.h:209
#25 0x000055555f105eba in Envoy::Network::FilterManagerImpl::onContinueReading (this=0x7917e606f98, filter=0x0, buffer_source=...) at source/common/network/filter_manager_impl.cc:95
#26 0x000055555f106423 in Envoy::Network::FilterManagerImpl::onRead (this=0x7917e606f98) at source/common/network/filter_manager_impl.cc:105
#27 0x000055555f00badb in Envoy::Network::ConnectionImpl::onRead (this=0x7917e606f00, read_buffer_size=8135) at source/common/network/connection_impl.cc:345
#28 0x000055555f018ea0 in Envoy::Network::ConnectionImpl::onReadReady (this=0x7917e606f00) at source/common/network/connection_impl.cc:651

In frame 27, it's processing the last chunk(8135B in this case). My wasm filter get 2 calls for this chunk: 1) onResponseBody(body_length=17111, end_of_stream=0). This 17111B is actually the previous chunk accumulated in FilterManager::buffered_requestdata. 2) onResponseBody(body_length=25246, end_of_stream=1). This body length is complete, because the previous call accumulated the last 8135B into the FilterManager::buffered_requestdata, and the end_of_stream is true because the above code in onMessageCompleteBase().

For my non-working http2 case, I don't know the design intention. Do we need similar logic in http2 parser?

juanmolle commented 1 day ago

This seams to be this same issue https://github.com/envoyproxy/envoy/issues/35884 I got a proposal for a fix and waiting if that is the correct way to fix it to complete with all tests

https://github.com/envoyproxy/envoy/pull/35885

dandlake commented 20 hours ago

Thanks, @juanmolle ! I tried your fix and it works for me for my test case.