Open kaklakariada opened 2 years ago
I investigated the issue during Kehrwoche, but did not have time to finish. The tests get stuck during TLS handshake inside the luws
library. When looking at the traffic, you can see that there are small packets constantly exchanged.
So we know two things:
luws
connection attempt is not met.I will investigate further in the next Kehrwoche.
To see the problem in a local test follow these steps:
nc -lkp 3000
docker-db
instance (pick a version that has the Lua OpenSSL library!)EXASOL_HOST=localhost luarocks --verbose test -- --run=ci_repeated --o TAP
A faster way to reproduce the problem is:
EXASOL_HOST=localhost LOG_LEVEL=INFO busted --repeat 2 -o TAP -p 'data_types_spec' -p 'Websocket_spec'
or
EXASOL_HOST=localhost LOG_LEVEL=INFO busted --repeat 2 -o TAP -p 'luasql_compatibility_spec' -p 'Websocket_spec'
I can now even trigger the problem without a repeated test:
EXASOL_HOST=localhost LOG_LEVEL=INFO busted -o TAP -p 'udf_spec' -p 'Websocket_spec'
I can reduce data_types_spec
to a single test case and still trigger the issue with
EXASOL_HOST=localhost busted -o TAP -p 'data_types_spec' -p 'Websocket_spec'
Additionally I found that you can disable some test cases in Websocket_spec
and still see the problem, while disabling others hides it. Unfortunately I don't see a pattern yet.
Here's a tip from Zane:
If you see the connection stuck:
- log into Exasol
ps aux | grep exacs
(ConnectionServer process)kill -s 6 <pid>
This will create a core dump and backtrackes that Zane can inspect
Note from a discussion with a colleague: Docker or Docker network could be the culprit. Try the same test without dock or via SSH tunnel to internal port.
This issue has the same root cause as #91 and requires a server fix.
Repeated tests take a long time running, see example log: https://github.com/exasol/exasol-driver-lua/runs/6224913449?check_suite_focus=true
Activate repeated tests and fix the long runtime.