equalsraf / neovim-qt

Neovim client library and GUI, in Qt5.
https://github.com/equalsraf/neovim-qt/wiki
ISC License
1.85k stars 171 forks source link

connectToNeovimTCP test fails on FreeBSD #289

Open jamessan opened 7 years ago

jamessan commented 7 years ago

I recently uploaded neovim-qt 0.2.7 to Debian and the build is failing on [Debian/kFreeBSD] because of the connectToNeovimTCP test. I've also reproduced the failure on normal FreeBSD.

                SPYWAIT(onError);

                QCOMPARE(c->errorCause(), NeovimConnector::SocketError);

QCOMPARE() is failing because no error has been asserted.

Looking at the ktraces for the original test, I see this:

 47253 tst_neovimconnector 0.015313 CALL  connect(0x7,0xbfbfe000,0x10)
 47253 tst_neovimconnector 0.015318 STRU  struct sockaddr { AF_INET, 127.0.0.1:64999 }
 47253 tst_neovimconnector 0.015370 RET   connect -1 errno 36 Operation now in progress
 47253 tst_neovimconnector 0.015428 CALL  clock_gettime(0x4,0xbfbfdf14)
 47253 tst_neovimconnector 0.015433 RET   clock_gettime 0
 47253 tst_neovimconnector 0.015473 CALL  clock_gettime(0x4,0xbfbfe134)
 47253 tst_neovimconnector 0.015478 RET   clock_gettime 0
 47253 tst_neovimconnector 0.015496 CALL  clock_gettime(0x4,0xbfbfe014)
 47253 tst_neovimconnector 0.015499 RET   clock_gettime 0
 47253 tst_neovimconnector 0.015507 CALL  poll(0x1090c70,0x2,0)
 47253 tst_neovimconnector 0.015514 RET   poll 1
...
 47253 tst_neovimconnector 5.264776 CALL  poll(0x1090c70,0x2,0x1)
 47253 tst_neovimconnector 5.264780 RET   poll 1
 47253 tst_neovimconnector 5.264784 CALL  clock_gettime(0x4,0xbfbfe064)
 47253 tst_neovimconnector 5.264787 RET   clock_gettime 0
 47253 tst_neovimconnector 5.264792 CALL  clock_gettime(0x4,0xbfbfe004)
 47253 tst_neovimconnector 5.264795 RET   clock_gettime 0
 47253 tst_neovimconnector 5.264833 CALL  write(0x4,0xbfbfdf00,0x1)
 47253 tst_neovimconnector 5.264869 GIO   fd 4 wrote 1 byte
       0x0000 01                                                                                                                            |.|

 47253 tst_neovimconnector 5.264873 RET   write 1
 47253 tst_neovimconnector 5.264937 CALL  write(0x1,0x1097580,0x105)
 47253 tst_neovimconnector 5.264963 GIO   fd 1 wrote 261 bytes
       "FAIL!  : NeovimQt::Test::connectToNeovimTCP() Compared values are not the same
           Actual   (c->errorCause())             : NoError
           Expected (NeovimConnector::SocketError): SocketError
           Loc: [/home/jamessan/neovim-qt-0.2.7/test/tst_neovimconnector.cpp(65)]
       "

So, Qt attempts to connect(), gets an EINPROGRESS error (which is valid for a non-blocking socket), sets the QTcpSocket state to Connecting, and then continues waiting until the spy times out.

If I change SPYWAIT(onError); to SPYWAIT(onError, 35000); then the test succeeds:

QWARN  : NeovimQt::Test::connectToNeovimTCP() Neovim fatal error "Connection refused"
PASS   : NeovimQt::Test::connectToNeovimTCP()

Here are the ktraces for the 35s spy:

 50388 tst_neovimconnector 0.015174 CALL  connect(0x7,0xbfbfe000,0x10)
 50388 tst_neovimconnector 0.015180 STRU  struct sockaddr { AF_INET, 127.0.0.1:64999 }
 50388 tst_neovimconnector 0.015235 RET   connect -1 errno 36 Operation now in progress
 50388 tst_neovimconnector 0.015312 CALL  clock_gettime(0x4,0xbfbfdf14)
 50388 tst_neovimconnector 0.015320 RET   clock_gettime 0
 50388 tst_neovimconnector 0.015363 CALL  clock_gettime(0x4,0xbfbfe134)
 50388 tst_neovimconnector 0.015369 RET   clock_gettime 0
 50388 tst_neovimconnector 0.015406 CALL  clock_gettime(0x4,0xbfbfe014)
 50388 tst_neovimconnector 0.015411 RET   clock_gettime 0
 50388 tst_neovimconnector 0.015420 CALL  poll(0x1090c70,0x2,0)
 50388 tst_neovimconnector 0.015429 RET   poll 1
...
 50388 tst_neovimconnector 30.421875 CALL  poll(0x1090c70,0x2,0x1)
 50388 tst_neovimconnector 30.421879 RET   poll 1
 50388 tst_neovimconnector 30.421883 CALL  clock_gettime(0x4,0xbfbfe064)
 50388 tst_neovimconnector 30.421886 RET   clock_gettime 0
 50388 tst_neovimconnector 30.421892 CALL  clock_gettime(0x4,0xbfbfe004)
 50388 tst_neovimconnector 30.421896 RET   clock_gettime 0
 50388 tst_neovimconnector 30.421950 CALL  close(0x7)
 50388 tst_neovimconnector 30.421972 RET   close 0
 50388 tst_neovimconnector 30.422025 CALL  socket(PF_INET,SOCK_STREAM,IPPROTO_IP)
 50388 tst_neovimconnector 30.422043 RET   socket 7
 50388 tst_neovimconnector 30.422047 CALL  fcntl(0x7,F_SETFD,FD_CLOEXEC)
 50388 tst_neovimconnector 30.422052 RET   fcntl 0
 50388 tst_neovimconnector 30.422056 CALL  fcntl(0x7,F_GETFL,0x1)
 50388 tst_neovimconnector 30.422060 RET   fcntl 2
 50388 tst_neovimconnector 30.422063 CALL  fcntl(0x7,F_SETFL,0x6<O_RDWR|O_NONBLOCK>)
 50388 tst_neovimconnector 30.422067 RET   fcntl 0
 50388 tst_neovimconnector 30.422073 CALL  setsockopt(0x7,SOL_SOCKET,SO_OOBINLINE,0xbfbfdd08,0x4)
 50388 tst_neovimconnector 30.422080 RET   setsockopt 0
 50388 tst_neovimconnector 30.422087 CALL  connect(0x7,0xbfbfdd10,0x10)
 50388 tst_neovimconnector 30.422094 STRU  struct sockaddr { AF_INET, 127.0.0.1:64999 }
 50388 tst_neovimconnector 30.422149 RET   connect -1 errno 36 Operation now in progress
 50388 tst_neovimconnector 30.422162 CALL  clock_gettime(0x4,0xbfbfdc24)
 50388 tst_neovimconnector 30.422165 RET   clock_gettime 0
 50388 tst_neovimconnector 30.422179 CALL  clock_gettime(0x4,0xbfbfe014)
 50388 tst_neovimconnector 30.422183 RET   clock_gettime 0
 50388 tst_neovimconnector 30.422189 CALL  poll(0x1090c70,0x2,0x1388)
 50388 tst_neovimconnector 30.422239 RET   poll 1
 50388 tst_neovimconnector 30.422250 CALL  connect(0x7,0xbfbfdf70,0x10)
 50388 tst_neovimconnector 30.422255 STRU  struct sockaddr { AF_INET, 127.0.0.1:64999 }
 50388 tst_neovimconnector 30.422259 RET   connect -1 errno 22 Invalid argument
...

It starts off the same, but then some internal timeout appears to happen causing Qt to try re-connecting. That also encounters EINPROGRESS, but Qt then tries to call connect() again and fails.

It's certainly not ideal for the test to take 30 seconds to error out, but I'm not sure if there's anything else that can be done from the neovim-qt side.

jamessan commented 7 years ago

Hmm, a small percentage of the time I get what I think is the expected behavior:

  3555 tst_neovimconnector 0.013627 CALL  connect(0x7,0x7fffffffd840,0x10)
  3555 tst_neovimconnector 0.013628 STRU  struct sockaddr { AF_INET, 127.0.0.1:64999 }
  3555 tst_neovimconnector 0.013728 RET   connect -1 errno 61 Connection refused
  3555 tst_neovimconnector 0.013782 CALL  clock_gettime(0x4,0x7fffffffd250)
  3555 tst_neovimconnector 0.013791 RET   clock_gettime 0
  3555 tst_neovimconnector 0.013829 CALL  write(0x1,0x804865000,0x56)
  3555 tst_neovimconnector 0.013844 GIO   fd 1 wrote 86 bytes
       "QWARN  : NeovimQt::Test::connectToNeovimTCP() Neovim fatal error "Connection refused"
       "

So the question is how to nudge Qt to properly get to that state when the initial connect return EINPROGRESS.

equalsraf commented 7 years ago

@jamessan not totaly unexpected its not impossible that the TCP machinery takes a while. The default timeout is 5s, which might not be that much. Does #290 fix it?

jamessan commented 7 years ago

No, it seems to be waiting for QT_CONNECT_TIMEOUT to expire, which is 30s.

I think something's wrong here on the Qt side. It's not handling the EINPROGRESS properly. On Linux, you see something like this:

14:53:41.472020 connect(4, {sa_family=AF_INET, sin_port=htons(64999), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
14:53:41.472195 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLOUT}], 2, 0) = 1 ([{fd=4, revents=POLLOUT|POLLERR|POLLHUP}])
14:53:41.472241 connect(4, {sa_family=AF_INET, sin_port=htons(64999), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)
14:53:41.472279 close(4)                = 0
14:53:41.472322 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4
14:53:41.472348 setsockopt(4, SOL_SOCKET, SO_OOBINLINE, [1], 4) = 0
14:53:41.472372 connect(4, {sa_family=AF_INET, sin_port=htons(64999), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
14:53:41.472438 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLOUT}], 2, 5250) = 1 ([{fd=4, revents=POLLOUT|POLLERR|POLLHUP}])
14:53:41.472472 connect(4, {sa_family=AF_INET, sin_port=htons(64999), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)

After the EINPROGRESS the poll detects the socket is ready and the subsequent connect() gets the expected ECONNREFUSED, all in the span of a few microseconds.