exasol / exasol-driver-lua

Exasol SQL driver for Lua
MIT License
1 stars 0 forks source link

Fix repeated tests on GH Actions #56

Open kaklakariada opened 2 years ago

kaklakariada commented 2 years ago

Repeated tests take a long time running, see example log: https://github.com/exasol/exasol-driver-lua/runs/6224913449?check_suite_focus=true

Activate repeated tests and fix the long runtime.

redcatbear commented 1 year ago

I investigated the issue during Kehrwoche, but did not have time to finish. The tests get stuck during TLS handshake inside the luws library. When looking at the traffic, you can see that there are small packets constantly exchanged.

So we know two things:

  1. The logic break condition in luws connection attempt is not met.
  2. Timeout does not work

I will investigate further in the next Kehrwoche.

redcatbear commented 1 year ago

To see the problem in a local test follow these steps:

  1. Start a socket listener with nc -lkp 3000
  2. Start a local docker-db instance (pick a version that has the Lua OpenSSL library!)
  3. Run the repeated tests
    EXASOL_HOST=localhost luarocks --verbose test -- --run=ci_repeated --o TAP
redcatbear commented 1 year ago

A faster way to reproduce the problem is:

EXASOL_HOST=localhost LOG_LEVEL=INFO busted --repeat 2 -o TAP -p 'data_types_spec' -p 'Websocket_spec'

or

EXASOL_HOST=localhost LOG_LEVEL=INFO busted --repeat 2 -o TAP -p 'luasql_compatibility_spec' -p 'Websocket_spec'
redcatbear commented 1 year ago

I can now even trigger the problem without a repeated test:

EXASOL_HOST=localhost LOG_LEVEL=INFO busted -o TAP -p 'udf_spec' -p 'Websocket_spec'
redcatbear commented 1 year ago

I can reduce data_types_spec to a single test case and still trigger the issue with

EXASOL_HOST=localhost busted -o TAP -p 'data_types_spec' -p 'Websocket_spec'

Additionally I found that you can disable some test cases in Websocket_spec and still see the problem, while disabling others hides it. Unfortunately I don't see a pattern yet.

redcatbear commented 1 year ago

Here's a tip from Zane:

If you see the connection stuck:

  1. log into Exasol
  2. ps aux | grep exacs (ConnectionServer process)
  3. kill -s 6 <pid>

This will create a core dump and backtrackes that Zane can inspect

redcatbear commented 1 year ago

Note from a discussion with a colleague: Docker or Docker network could be the culprit. Try the same test without dock or via SSH tunnel to internal port.

redcatbear commented 3 months ago

This issue has the same root cause as #91 and requires a server fix.