drogonframework / drogon

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows
MIT License
11.44k stars 1.1k forks source link

net::ERR_CONTENT_LENGTH_MISMATCH when serving large static files via HTTPS since 1.8.5 #1736

Closed tripleslash closed 1 year ago

tripleslash commented 1 year ago

Describe the bug Since drogon version 1.8.5, when serving static files over HTTPS larger than Content-Length: 63865 bytes the server will only transmit 328 bytes of the request and then hang indefinitely. The request never finishes and the browser will eventually cancel with net::ERR_CONTENT_LENGTH_MISMATCH. I believe this error was introduced with the trantor TLS refactoring. drogon 1.8.4 doesn't seem to suffer from this problem.

To Reproduce Steps to reproduce the behavior:

  1. Configure a self signed certificate
  2. Use drogon 1.8.5
  3. Serve static file with 63865 bytes (content type javascript) to some third unrelated PC (localhost seems to work always for some reason)
  4. See error

Browser:

Webserver:

an-tao commented 1 year ago

This issue maybe introduced by the new TLS implement. I'll figure it out.

an-tao commented 1 year ago

@tripleslash I can't reproduce this issue, can you provide more details please? thanks so much.

tripleslash commented 1 year ago

Hm I'm unsure what other info I can provide here... The problem happens every time for me when I try it with drogon 1.8.5 but somehow it loads if I load it from https://localhost on the respective devices. But when I serve the file to another client it only ever gets 328 bytes and then nothing more will be sent and the browser eventually kills the connection and reports content length mismatch. On drogon 1.8.4 the file is fully served

marty1885 commented 1 year ago

Hi,

I can't replicate this issue on my AArch64 development machine. I can also verify that the hash of the transmitted file is the same as the original. However, I don't have a ARMv7 environment I can test on. Do you have a 64bit ARM machine? So far this looks like a bit-ness issue. Or ARMv7 specific.

❯ curl -k https://localhost:8800/libdrogon.txt -v --tlsv1.3 > /dev/null
....
 GET /libdrogon.txt HTTP/1.1
> Host: localhost:8800
> User-Agent: curl/8.2.1
> Accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [233 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [233 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
< HTTP/1.1 200 OK
< content-length: 49939026
< content-type: text/plain; charset=utf-8
< server: drogon/1.8.4
< accept-range: bytes
< expires: Thu, 01 Jan 1970 00:00:00 GMT
< last-modified: Sun, 20 Aug 2023 16:31:07 GMT
< date: Sun, 20 Aug 2023 16:32:32 GMT
< 
{ [5 bytes data]
100 47.6M  100 47.6M    0     0   208M      0 --:--:-- --:--:-- --:--:--  208M
* Connection #0 to host localhost left intact

The system I'm testing on:

❯ archey3

               +                OS: Arch Linux aarch64
               #                Hostname: nina
              ###               Kernel Release: 5.10.35-00001-g107b6c90afff
             #####              Uptime: 2 days, 20:03
             ######             WM: None
            ; #####;            DE: None
           +##.#####            Packages: 1802
          +##########           RAM: 4024 MB / 29965 MB
         #############;         $EDITOR: vim
        ###############+        Root: 129G / 466G (27%) (ext4)
       #######   #######        
     .######;     ;###;`".      
    .#######;     ;#####.       
    #########.   .########`     
   ######'           '######    
  ;####                 ####;   
  ##'                     '##   
 #'                         `#  

❯ lscpu | head -n 9
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          16
On-line CPU(s) list:             0-15
Vendor ID:                       ARM
Model name:                      Cortex-A72
Model:                           3
Thread(s) per core:              1
tripleslash commented 1 year ago

Here is a screenshot showing the issue in chrome. This is with drogon v1.8.5 image

image

You can see that the javascript file is cut off in the middle in a string literal "syntax"

curl for Windows reports everything to be working however...:

* processing: https://192.168.200.213/ui/js/common-vendor-angular.js.b335f9159dc537db6a6a.bundle.js
*   Trying [::1]:12345...
* Connected to localhost (::1) port 12345
* SOCKS5 connect to 192.168.200.213:443 (remotely resolved)
* SOCKS5 request granted.
* Connected to localhost (::1) port 12345
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server did not agree on a protocol. Uses default.
* Server certificate:
*  subject: <snip>
*  start date: Aug 18 14:30:00 2023 GMT
*  expire date: Jul 25 14:30:00 2123 GMT
*  issuer: <snip>
*  SSL certificate verify result: self-signed certificate in certificate chain (19), continuing anyway.
* using HTTP/1.x
> GET /ui/js/common-vendor-angular.js.b335f9159dc537db6a6a.bundle.js HTTP/1.1
> Host: 192.168.200.213
> User-Agent: curl/8.2.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/1.1 200 OK
< content-length: 185286
< content-type: text/javascript; charset=utf-8
< server: drogon/1.8.5
< access-control-allow-origin: *
< accept-range: bytes
< expires: Thu, 01 Jan 1970 00:00:00 GMT
< last-modified: Sat, 19 Aug 2023 12:54:53 GMT
< date: Sun, 20 Aug 2023 20:39:00 GMT
<
/*! For license information please see common-vendor-angular.js.b335f9159dc537db6a6a.bundle.js.LICENSE.txt */

With drogon 1.8.4 it properly loads the complete file in chrome:

image

marty1885 commented 1 year ago

@tripleslash I see. I'll try with a browser when I got back home. Can you try Firefox (based on Gecko) on your end? I want to know if this is a browser engine depending issue.

Also, can you share the file causing trouble in question?


Edit: I can seem to replicate the issue with firefox. But it's weird (and maybe not a framework issue).

When I try to open a extremely large text file in firefox. It gets stuck due to firefox trying to rendering said file. But when I try to download it without rendering. It works perfectly


Edit: I can't replicate the issue even with a text file with the exact same size. @tripleslash Can you try latest master?

圖片

tripleslash commented 1 year ago

I tested it with Chrome 115.0.5790.173 and Firefox 116.0.3 and the issue happens in both browsers. It is not just a rendering issue because my website also doesn't load. It fails to interpret the javascript code because of the unterminated string literal (because webserver doesn't transmit the rest of the content).

Latest master doesn't contain any new commits from 1.8.5 tag so I am not sure what you want me to test? The bug report was created using drogon commit 83e08f4b2732e9f3d58f71379d3f7e117cb9180e

marty1885 commented 1 year ago

Oops. I should have read more clearly. Can you share the file in question? I am having a very hard time replicating the issue on my end.

an-tao commented 1 year ago

@tripleslash What occurs when you utilize HTTP instead of HTTPS to load the JavaScript file? thanks.

tripleslash commented 1 year ago

With HTTP everything loads correctly @an-tao Sorry I can not share the javascript file (contains some proprietary code that I have no permission to) @marty1885 but it is exactly 63865 bytes long

rbugajewski commented 1 year ago

@tripleslash This will make it slightly harder for us to investigate. Did you try to serve another file over HTTPS that is slightly bigger? Do you also have this issue? If yes, can you then share this, less sensitive, file?

an-tao commented 1 year ago

Did you set the server_header_field in config.json? @tripleslash

tripleslash commented 1 year ago

@tripleslash This will make it slightly harder for us to investigate. Did you try to serve another file over HTTPS that is slightly bigger? Do you also have this issue? If yes, can you then share this, less sensitive, file?

It also happens if I change the content in the file arbitrarily (but I kept the size the same 63865 bytes)

tripleslash commented 1 year ago

Did you set the server_header_field in config.json? @tripleslash

Yes I did change the Server: header field via drogon::app().setServerHeaderField(...)

an-tao commented 1 year ago

Did you set the server_header_field in config.json? @tripleslash

Yes I did change the Server: header field via drogon::app().setServerHeaderField(...)

Note that the header you set cannot contain the "\r\n" suffix.

tripleslash commented 1 year ago

Did you set the server_header_field in config.json? @tripleslash

Yes I did change the Server: header field via drogon::app().setServerHeaderField(...)

Note that the header you set cannot contain the "\r\n" suffix.

It only contains [a-zA-Z0-9] and [-/]

marty1885 commented 1 year ago

I borrowed a Raspberry Pi 4 from my friend. Running a 32bit userspace. And created a dummy ASCII file of size 63865 bytes. Still can't replicate the issue.

For reference. This is the how I did the experiment.

First, a simple C program that creates 63865 byte sized file called dummy.txt

#include <stdio.h>
#include <stdlib.h>

int main()
{
        FILE* f = fopen("dummy.txt", "w");
        const char* alphabet = "abcdefghijklmnopqrstuvwxyz";
        int sz = 26;
        for(int i=0;i<63865;i++) {
                fputc(alphabet[i%sz], f);
        }
        fflush(f);
        fclose(f);
}

I compiled and ran this script to generate dummy.txt. Confirming it has the exact size.

❯ cc mkfile.c -o mkfile
❯ ./mkfile
❯ ls dummy.txt  -l
-rw-r--r-- 1 marty1885 marty1885 63865 Aug 21 15:16 dummy.txt

Then I copied trantor's test certs to cwd and modify the helloworld server to act as a HTTPS server.

❯ pwd
/home/marty1885/drogon/build
❯ cp ../trantor/trantor/tests/server.* .
❯ vim ../examples/helloworld/main.cc
❯ git diff
diff --git a/examples/helloworld/main.cc b/examples/helloworld/main.cc
index 7b92827b..ba40245f 100644
--- a/examples/helloworld/main.cc
+++ b/examples/helloworld/main.cc
@@ -66,5 +66,8 @@ int main()
     // you want the server also listen on 127.0.0.1 port 5555. Just add another
     // line of addListener("127.0.0.1", 5555)
     LOG_INFO << "Server running on 127.0.0.1:8848";
-    app().addListener("127.0.0.1", 8848).run();
+    app().addListener("127.0.0.1", 8848)
+           .setDocumentRoot(".")
+           .addListener("127.0.0.1", 8800, true, "server.crt", "server.key")
+           .run();
 }

Finally compile and run. Then access /dummy.txt. Firefox can display it perfectly.

image

The system info

❯  uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
❯  file ./examples/helloworld 
./examples/helloworld: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=662ef6573659a8833852abc48cca02f8548aa90b, for GNU/Linux 3.2.0, not stripped

I also tried enabling/disabling all compression. Same.

===

Next step. I'll try brute-forcing file sizes from 1 bytes to 10M. See if I can find an edge case.

tripleslash commented 1 year ago

I borrowed a Raspberry Pi 4 from my friend. Running a 32bit userspace. And created a dummy ASCII file of size 63865 bytes. Still can't replicate the issue.

For reference. This is the how I did the experiment.

First, a simple C program that creates 63865 byte sized file called dummy.txt

#include <stdio.h>
#include <stdlib.h>

int main()
{
        FILE* f = fopen("dummy.txt", "w");
        const char* alphabet = "abcdefghijklmnopqrstuvwxyz";
        int sz = 26;
        for(int i=0;i<63865;i++) {
                fputc(alphabet[i%sz], f);
        }
        fflush(f);
        fclose(f);
}

I compiled and ran this script to generate dummy.txt. Confirming it has the exact size.

❯ cc mkfile.c -o mkfile
❯ ./mkfile
❯ ls dummy.txt  -l
-rw-r--r-- 1 marty1885 marty1885 63865 Aug 21 15:16 dummy.txt

Then I copied trantor's test certs to cwd and modify the helloworld server to act as a HTTPS server.

❯ pwd
/home/marty1885/drogon/build
❯ cp ../trantor/trantor/tests/server.* .
❯ vim ../examples/helloworld/main.cc
❯ git diff
diff --git a/examples/helloworld/main.cc b/examples/helloworld/main.cc
index 7b92827b..ba40245f 100644
--- a/examples/helloworld/main.cc
+++ b/examples/helloworld/main.cc
@@ -66,5 +66,8 @@ int main()
     // you want the server also listen on 127.0.0.1 port 5555. Just add another
     // line of addListener("127.0.0.1", 5555)
     LOG_INFO << "Server running on 127.0.0.1:8848";
-    app().addListener("127.0.0.1", 8848).run();
+    app().addListener("127.0.0.1", 8848)
+           .setDocumentRoot(".")
+           .addListener("127.0.0.1", 8800, true, "server.crt", "server.key")
+           .run();
 }

Finally compile and run. Then access /dummy.txt. Firefox can display it perfectly.

image

The system info

❯  uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
❯  file ./examples/helloworld 
./examples/helloworld: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=662ef6573659a8833852abc48cca02f8548aa90b, for GNU/Linux 3.2.0, not stripped

I also tried enabling/disabling all compression. Same.

===

Next step. I'll try brute-forcing file sizes from 1 bytes to 10M. See if I can find an edge case.

Your Browser says it transferred 486 bytes, are you sure you didnt encounter the exact same issue as me in that screenshot of yours there?

marty1885 commented 1 year ago

Your Browser says it transferred 486 bytes, are you sure you didnt encounter the exact same issue as me in that screenshot of yours there?

Yes. As I mentioned I can still load the file if I turn gzip off and actually transmit 63865 bytes.

image

Wait a second. What's the transmission size on your end? I assumed it was 63865. But I'm confused now

tripleslash commented 1 year ago

I figured out the cause of the issue. Drogon in version 1.8.5 is actually crashing on my machine and then is restarting because I run it as service. Analyzing the logs now. The crash just seems to happen in 1.8.5 and 1.8.4 runs without problems.

Program terminated with signal SIGABRT, Aborted.
#0  0x7671fdd6 in ?? () from /lib/libc.so.6
[Current thread is 1 (Thread 0x710f9d20 (LWP 386))]
(gdb) bt
#0  0x7671fdd6 in ?? () from /lib/libc.so.6
#1  0x7675fcc8 in ?? () from /lib/libc.so.6
#2  0x7672eb36 in raise () from /lib/libc.so.6
#3  0x7671f86c in abort () from /lib/libc.so.6
#4  0x768ff516 in std::__glibcxx_assert_fail(char const*, int, char const*, char const*) () from /lib/libstdc++.so.6
#5  0x76affd62 in trantor::TcpConnectionImpl::sendFileInLoop(std::shared_ptr<trantor::TcpConnectionImpl::BufferNode> const&) () from /lib/libtrantor.so.1
#6  0x76b01bbe in trantor::TcpConnectionImpl::sendFile(int, unsigned int, unsigned int) () from /lib/libtrantor.so.1
#7  0x76bddb3e in drogon::HttpServer::sendResponse(std::shared_ptr<trantor::TcpConnection> const&, std::shared_ptr<drogon::HttpResponse> const&, bool) () from /lib/libdrogon.so.1
#8  0x76be02a0 in drogon::HttpServer::onRequests(std::shared_ptr<trantor::TcpConnection> const&, std::vector<std::shared_ptr<drogon::HttpRequestImpl>, std::allocator<std::shared_ptr<drogon::HttpRequestImpl> > > const&, std::shared_ptr<drogon::HttpRequestParser> const&) () from /lib/libdrogon.so.1
#9  0x76be0f7e in drogon::HttpServer::onMessage(std::shared_ptr<trantor::TcpConnection> const&, trantor::MsgBuffer*)
    () from /lib/libdrogon.so.1
#10 0x76afd77a in trantor::TcpConnectionImpl::onSslMessage(trantor::TcpConnection*, trantor::MsgBuffer*) ()
   from /lib/libtrantor.so.1
#11 0x76b0bb54 in OpenSSLProvider::processApplicationData() () from /lib/libtrantor.so.1
#12 0x76b0db2a in OpenSSLProvider::recvData(trantor::MsgBuffer*) () from /lib/libtrantor.so.1
#13 0x76b012b8 in trantor::TcpConnectionImpl::readCallback() () from /lib/libtrantor.so.1
#14 0x76afac8e in trantor::Channel::handleEventSafely() () from /lib/libtrantor.so.1
#15 0x76afad1c in trantor::Channel::handleEvent() () from /lib/libtrantor.so.1
#16 0x76aefce4 in trantor::EventLoop::loop() () from /lib/libtrantor.so.1
#17 0x76af094e in trantor::EventLoopThread::loopFuncs() () from /lib/libtrantor.so.1
#18 0x76901ce2 in ?? () from /lib/libstdc++.so.6
#19 0x7675e596 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

This is the stack trace

an-tao commented 1 year ago

Did you update the trantor submodule?

tripleslash commented 1 year ago

Yes, I updated both. I also don't believe OpenSSLProvider::processApplicationData existed before 1.5.12? which is clearly shown in the stack trace

an-tao commented 1 year ago

Can you build it in debug mode (both drogon and your application) and use gdb to run it?

There is another possibility that outdated versions of the Drogon or Trantor libraries might still exist in your system, and your application is linking against the wrong library. Please confirm to rule out this situation.

tripleslash commented 1 year ago

I just completely rebuilt the whole BSP + application and drogon+trantor in Debug mode. This is the stacktrace with debug symbols.

Thread 9 "DrogonIoLoop" received signal SIGABRT, Aborted.
[Switching to Thread 0x71af9d20 (LWP 10775)]
0x7614add6 in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x7614add6 in ?? () from /lib/libc.so.6
#1  0x7618acc8 in ?? () from /lib/libc.so.6
#2  0x76159b36 in raise () from /lib/libc.so.6
#3  0x7614a86c in abort () from /lib/libc.so.6
#4  0x7632a516 in std::__glibcxx_assert_fail(char const*, int, char const*, char const*) () from /lib/libstdc++.so.6
#5  0x76af3a2a in std::vector<char, std::allocator<char> >::operator[](unsigned int) () from /lib/libdrogon.so.1
#6  0x76651052 in trantor::TcpConnectionImpl::sendFileInLoop(std::shared_ptr<trantor::TcpConnectionImpl::BufferNode> const&) () from /lib/libtrantor.so.1
#7  0x7664f78a in trantor::TcpConnectionImpl::sendFile(int, unsigned int, unsigned int) () from /lib/libtrantor.so.1
#8  0x7664f1b8 in trantor::TcpConnectionImpl::sendFile(char const*, unsigned int, unsigned int) () from /lib/libtrantor.so.1
#9  0x76b11460 in drogon::HttpServer::sendResponse(std::shared_ptr<trantor::TcpConnection> const&, std::shared_ptr<drogon::HttpResponse> const&, bool) () from /lib/libdrogon.so.1
#10 0x76b11670 in drogon::HttpServer::sendResponses(std::shared_ptr<trantor::TcpConnection> const&, std::vector<std::pair<std::shared_ptr<drogon::HttpResponse>, bool>, std::allocator<std::pair<std::shared_ptr<drogon::HttpResponse>, bool> > > const&, trantor::MsgBuffer&) () from /lib/libdrogon.so.1
#11 0x76b1025a in drogon::HttpServer::onRequests(std::shared_ptr<trantor::TcpConnection> const&, std::vector<std::shared_ptr<drogon::HttpRequestImpl>, std::allocator<std::shared_ptr<drogon::HttpRequestImpl> > > const&, std::shared_ptr<drogon::HttpRequestParser> const&) () from /lib/libdrogon.so.1
#12 0x76b0fd66 in drogon::HttpServer::onMessage(std::shared_ptr<trantor::TcpConnection> const&, trantor::MsgBuffer*) () from /lib/libdrogon.so.1
#13 0x76b0f1f0 in ?? () from /lib/libdrogon.so.1
#14 0x76b13f2e in ?? () from /lib/libdrogon.so.1
#15 0x76b13508 in ?? () from /lib/libdrogon.so.1
#16 0x76b12dda in ?? () from /lib/libdrogon.so.1
#17 0x7665a528 in std::function<void (std::shared_ptr<trantor::TcpConnection> const&, trantor::MsgBuffer*)>::operator()(std::shared_ptr<trantor::TcpConnection> const&, trantor::MsgBuffer*) const () from /lib/libtrantor.so.1
#18 0x76651e52 in trantor::TcpConnectionImpl::onSslMessage(trantor::TcpConnection*, trantor::MsgBuffer*) () from /lib/libtrantor.so.1
#19 0x76673b54 in OpenSSLProvider::processApplicationData() () from /lib/libtrantor.so.1
#20 0x76672bda in OpenSSLProvider::recvData(trantor::MsgBuffer*) () from /lib/libtrantor.so.1
#21 0x7664ab8c in trantor::TcpConnectionImpl::readCallback() () from /lib/libtrantor.so.1
#22 0x7665f7c8 in void std::__invoke_impl<void, void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>(std::__invoke_memfun_deref, void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&) ()
   from /lib/libtrantor.so.1
#23 0x7665f25a in std::__invoke_result<void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>::type std::__invoke<void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>(void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&) () from /lib/libtrantor.so.1
#24 0x7665ec04 in void std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>::__call<void, , 0u>(std::tuple<>&&, std::_Index_tuple<0u>) () from /lib/libtrantor.so.1
#25 0x7665e33a in void std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>::operator()<, void>() () from /lib/libtrantor.so.1
#26 0x7665d870 in void std::__invoke_impl<void, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&>(std::__invoke_other, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&)
    () from /lib/libtrantor.so.1
#27 0x7665ccf6 in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&> >::value, void>::type std::__invoke_r<void, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&>(std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&) () from /lib/libtrantor.so.1
#28 0x7665bbbe in std::_Function_handler<void (), std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()> >::_M_invoke(std::_Any_data const&) () from /lib/libtrantor.so.1
#29 0x76a47a76 in std::function<void ()>::operator()() const () from /lib/libdrogon.so.1
#30 0x76643f64 in trantor::Channel::handleEventSafely() () from /lib/libtrantor.so.1
#31 0x76643e66 in trantor::Channel::handleEvent() () from /lib/libtrantor.so.1
#32 0x76624400 in trantor::EventLoop::loop() () from /lib/libtrantor.so.1
#33 0x76628050 in trantor::EventLoopThread::loopFuncs() () from /lib/libtrantor.so.1
#34 0x76627c34 in ?? () from /lib/libtrantor.so.1
#35 0x76628c44 in ?? () from /lib/libtrantor.so.1
#36 0x76628bf4 in ?? () from /lib/libtrantor.so.1
#37 0x76628b84 in ?? () from /lib/libtrantor.so.1
#38 0x76628b4e in ?? () from /lib/libtrantor.so.1
#39 0x76628b2a in ?? () from /lib/libtrantor.so.1
#40 0x7632cce2 in ?? () from /lib/libstdc++.so.6
#41 0x76189596 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
an-tao commented 1 year ago

Please use the "bt" command to trace the call stack and examine the exact crash location, especially the specific positions of the 5th and 6th layers (note the line numbers in the source code).

tripleslash commented 1 year ago

I don't currently have source information on the target built in my BSP so I can't inspect frames with source/line information

Will try to get the remote cross debugger to work so I can maybe give some more insights

VladlenPopolitov commented 1 year ago

@an-tao @tripleslash Sorry for interrupt you. Stack log shows assertion in standard library in vector operator[], called from sendFileInLoop (usually assertion in that place is range check exception). sendFileInLoop is in TcpConnectionImpl.cc and has only one [] operator in line 1150 (for linux): auto n = fread(&(*fileBufferPtr)[0], ... It assert only if it cannot take address of the 1st element (0 index). Ff there is no 1st element, size() of fileBufferPtr equal 0. Size of the vector was not checked directly before, it was checked indirectly by while (filePtr->fileBytesToSend > 0). Likely filePtr->fileBytesToSend does not equal to fileBufferPtr_->size().

VladlenPopolitov commented 1 year ago

Off topic - two days ago views Cppcon 2020 video, a speaker said, that std::vector [] operator together with std::map [] operator one of the dangerous things in the current C++. Easy make error, and no exception and information (for better program performance)

VladlenPopolitov commented 1 year ago

@tripleslash sendFileInLoop function logs all variation of own error to log file. you could activate log file in configuration and look inside. It will not log anything in Debug mode , because it will be terminated by assertion before error message, but it will log all error in release mode.

VladlenPopolitov commented 1 year ago

it's wired, because on line 1132 we ensure that the vector pointed to by fileBufferPtr is not empty. In line 1130 checked, that pointer is not nullptr, but it can point to empty vector. Some lines above is code:

 fileBufferPtr_->resize(0);
                continue;

I do not know, is it reason, but it is only one place to cause assert - line 1150 and later line 1158

an-tao commented 1 year ago

it's wired, because on line 1132 we ensure that the vector pointed to by fileBufferPtr is not empty. In line 1130 checked, that pointer is not nullptr, but it can point to empty vector. Some lines above is code:

 fileBufferPtr_->resize(0);
                continue;

I do not know, is it reason, but it is only one place to cause assert - line 1150 and later line 1158

Most likely this is the problem, his app may be calling newStreamResponse(), triggering the code here, and then sending the file on the same connection causing the assertion to fail. I made a patch for this, https://github.com/an-tao/trantor/pull/286, @tripleslash , please check it out. @VladlenPopolitov thanks so much for your reminder.

tripleslash commented 1 year ago

it's wired, because on line 1132 we ensure that the vector pointed to by fileBufferPtr is not empty. In line 1130 checked, that pointer is not nullptr, but it can point to empty vector. Some lines above is code:

 fileBufferPtr_->resize(0);
                continue;

I do not know, is it reason, but it is only one place to cause assert - line 1150 and later line 1158

Most likely this is the problem, his app may be calling newStreamResponse(), triggering the code here, and then sending the file on the same connection causing the assertion to fail. I made a patch for this, an-tao/trantor#286, @tripleslash , please check it out. @VladlenPopolitov thanks so much for your reminder.

The patch works and the original error seems gone. However after putting some load on the server and serving about 200 requests I got this following crash:

Thread 10 "DrogonIoLoop" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x713f8d20 (LWP 22373)]
0x761e6dd6 in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x761e6dd6 in ?? () from /lib/libc.so.6
#1  0x7626d940 in write () from /lib/libc.so.6
#2  0x766ed92c in trantor::TcpConnectionImpl::writeRaw(void const*, unsigned int) () from /lib/libtrantor.so.1
#3  0x766edf16 in trantor::TcpConnectionImpl::onSslWrite(trantor::TcpConnection*, void const*, unsigned int) () from /lib/libtrantor.so.1
#4  0x7670fce4 in OpenSSLProvider::sendTLSData() () from /lib/libtrantor.so.1
#5  0x7670ed70 in OpenSSLProvider::sendData(char const*, unsigned int) () from /lib/libtrantor.so.1
#6  0x766ed996 in trantor::TcpConnectionImpl::writeInLoop(void const*, unsigned int) () from /lib/libtrantor.so.1
#7  0x766e715a in trantor::TcpConnectionImpl::writeCallback() () from /lib/libtrantor.so.1
#8  0x766fb820 in void std::__invoke_impl<void, void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>(std::__invoke_memfun_deref, void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&) ()
   from /lib/libtrantor.so.1
#9  0x766fb2b2 in std::__invoke_result<void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>::type std::__invoke<void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&>(void (trantor::TcpConnectionImpl::*&)(), trantor::TcpConnectionImpl*&) () from /lib/libtrantor.so.1
#10 0x766fac5c in void std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>::__call<void, , 0u>(std::tuple<>&&, std::_Index_tuple<0u>) () from /lib/libtrantor.so.1
#11 0x766fa392 in void std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>::operator()<, void>() () from /lib/libtrantor.so.1
#12 0x766f98c8 in void std::__invoke_impl<void, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&>(std::__invoke_other, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&)
    () from /lib/libtrantor.so.1
#13 0x766f8d4e in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&> >::value, void>::type std::__invoke_r<void, std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&>(std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()>&) () from /lib/libtrantor.so.1
#14 0x766f7c02 in std::_Function_handler<void (), std::_Bind<void (trantor::TcpConnectionImpl::*(trantor::TcpConnectionImpl*))()> >::_M_invoke(std::_Any_data const&) () from /lib/libtrantor.so.1
#15 0x76ae3a76 in std::function<void ()>::operator()() const () from /lib/libdrogon.so.1
#16 0x766dff8a in trantor::Channel::handleEventSafely() () from /lib/libtrantor.so.1
#17 0x766dfe66 in trantor::Channel::handleEvent() () from /lib/libtrantor.so.1
#18 0x766c0400 in trantor::EventLoop::loop() () from /lib/libtrantor.so.1
#19 0x766c4050 in trantor::EventLoopThread::loopFuncs() () from /lib/libtrantor.so.1
#20 0x766c3c34 in ?? () from /lib/libtrantor.so.1
#21 0x766c4c44 in ?? () from /lib/libtrantor.so.1
#22 0x766c4bf4 in ?? () from /lib/libtrantor.so.1
#23 0x766c4b84 in ?? () from /lib/libtrantor.so.1
#24 0x766c4b4e in ?? () from /lib/libtrantor.so.1
#25 0x766c4b2a in ?? () from /lib/libtrantor.so.1
#26 0x763c8ce2 in ?? () from /lib/libstdc++.so.6
#27 0x76225596 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
marty1885 commented 1 year ago

@tripleslash SIGPIPE is normal and can be ignored safely. Drogon specifically designed to catch the signal and handles it. GDB however catches it before the application handles it.

tripleslash commented 1 year ago

Then I can confirm the fix https://github.com/an-tao/trantor/pull/286 does work

VladlenPopolitov commented 1 year ago

@tripleslash If you run drogon under gdb, you can try to disable the handling of SIGPIPE by gdb: (gdb) handle SIGPIPE nostop or handle SIGPIPE nostop noprint pass