drogonframework / drogon

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows
MIT License
11.04k stars 1.06k forks source link

Support request stream #2055

Open hwc0919 opened 3 weeks ago

hwc0919 commented 3 weeks ago

Support stream request api.

Open for discussion.

When enable, the request will be processed right after all headers are received. In http handler, user should set a stream handler to HttpRequest object, and handle request body.

hwc0919 commented 3 weeks ago

Example codes:

https://github.com/drogonframework/drogon/blob/ad2760823dd287595ec68c818f3c33ef6c8a18d1/examples/async_stream/main.cc#L34-L81

https://github.com/drogonframework/drogon/blob/ad2760823dd287595ec68c818f3c33ef6c8a18d1/examples/async_stream/RequestStreamExampleCtrl.cc#L1-L167

I think the current design is pretty good. But still, there are endless details to worry about...

Here are some major problems.

1. which request should enter stream mode

How to decide which request will enter stream mode. I'm sure that in most cases, stream api are only needed in some of the handlers. But we can not know which request need it in advance, since the handler is chosen by Router after all pre-AOPs and filters are passed, but in order to get there without receiving the full body, the request has to enter stream mode first.

I have three solutions: 1. All requests enter stream mode first, then use a new api to receive all stream data back into request body field, so HttpRequest::body() works again. This looks like an ugly patch.

  1. Create a new AOP, called on headers ready, and let user turn on stream mode for each request manually. This will break the code flow apart.
  2. Create a new http handler function type. If old handler type is chosen, let framework do method-one automatically.

All methods introduce hidden rules, make it less elegant and easy for mistakes.

2. body access

In stream mode, request body is no longer accessible by HttpRequest::body() api. AOPs and filters can no longer access body. This should not be a big issue once we figure out how to choose the requests we want to enter stream mode.

### 3. Too many callbacks For normal post requests, we need dataCb, finishCb and errorCb. For multipart requests, we need yet another cb for multipart header.

Should we reduce them by reusing some? finishCb can be replace by calling dataCb will nullptr. But what about multipart data? We need to notify multipart block finish, and also the whole stream finish.

What should framework do when stream error? On current drogon master, if there is HTTP format error in (such as invalid chunk format), an 400 response will be sent by framework, and the connection will be cut. But multipart format error is not handled automatically, it is handled by user when MultipartParser::Parse() returns -1. The behaviors are already inconsistency, makes me more distressed to handle it in stream mode.

hwc0919 commented 2 weeks ago

越写越丑

提前开启 stream mode 才能走到 router, 如果匹配到的 handler 不是 stream handler 该怎么办?最新commit(9bffbb4)会自动等到所有body接收完之后再进入 handler. 如果接受body时遇到错误又该怎么办?

hwc0919 commented 2 weeks ago

更新

解决 1. which request should enter stream mode

使用方案3

在 routing 完成之后检查是否匹配到 stream-handler, 如果没匹配到,则自动等到所有body读取完毕,再进入后续流程。 所以现在的情况是,开启stream-mode后,non-stream-handler 只有在 postRoutingAdvices 以及之后的流程中可以访问 request body.

hwc0919 commented 2 weeks ago

还需要解决生命周期问题

用户调用 setStreamReader() 后,HttpRequestPtr 需要持有 StreamReaderPtr 的引用,否则后者会提前释放。此时如果用户在自己创建的 StreamReaderPtr 中捕获 req, 会导致循环引用。 因此需要择机释放持有的引用。现在选择的时机是当 streamFinish 或者 streamError 时, 清空 HttpRequestPtr 上的引用。

这导致一个问题:StreamReaderPtr 的生命周期比较奇怪。理想情况是 StreamReaderPtr 和 HttpRequestPtr 共享生命周期,但现在(如果用户不主动持有引用) StreamReaderPtr 在流结束时就会析构。

hwc0919 commented 1 week ago

总结

此 PR 接近完结, 如有任何疑问或建议,欢迎在此提出

解决的问题

  1. 在超大文件上传时, drogon会将multipart格式的原始请求存入临时文件,使用mmap映射到内存。由于存储的是原始请求(包含multipart boundary 等信息),用户转存文件时必须经过一次全量复制, 无法向用户提供 move() 之类的接口转存文件。如果文件过大,会消耗时间,且会多浪费一倍的存储空间。

  2. 32位系统中无法一次性上传超过4G的文件,使用流式接口可以解决这个问题

开启方式

通过 app().enableRequestStream(true); 或配置文件 "enable_request_stream": true 开启

配合新的 controller function 类型 (下称 stream-handler) , 在 controller 中设置 stream reader

app().registerHandler( 
     "/stream_req", 
     [](const HttpRequestPtr &req, 
        RequestStreamPtr &&stream, 
        std::function<void(const HttpResponsePtr &)> &&callback) { 
    if (stream) stream->setStreamReader(...);
}

RequestStreamReader 为接口类,用户需要自己提供实现,或使用内置的实现. 框架提供了一个普通的reader实现,和一个包含 multipart 解析的reader实现.

RequestStreamReader 有两个成员函数, onStreamData() 会在收到body数据时被调用,onStreamFinish() 会在所有数据接收完毕,或者接收过程出错时调用. onStreamFinish() 会且仅会被调用一次.

兼容性

一旦开启了 enble_request_stream, HttpRequest body 系列接口会收到影响 (bodyData(), bodyLength(), body(), getBody()).

生命周期问题

stream->setStreamReader(reader);

其他问题

当前的接口设计有一个前提:假设用户处理请求的速度足够快。

启用stream mode时,从drogon收到完整的请求头并启动处理流程,直到用户在handler内设置StreamReader这段时间,drogon并不会暂停body的接收, 在此阶段接收到的body会按照原来的方式放入内存或临时文件中。如果用户在前置 AOP/filters 内存在较长时间的异步操作, 可能会导致body的积压,甚至是全部接收完毕,使流式接口失去意义。

我们无法替用户决定是否或何时该暂停body的接收,用户可能并不想引入这多余的等待时间。

在此提出一个可能的优化方案:在收到 expect: 100-continue 请求头时加入一个 AOP 节点,让用户决定是否进入stream-mode, 是否暂停 body 接收 等等