SKA-ScienceDataProcessor / integration-prototype

SDP Integration Prototype
BSD 3-Clause "New" or "Revised" License
4 stars 7 forks source link

Docker stack Tango database funny #60

Open brianmcilwrath opened 5 years ago

brianmcilwrath commented 5 years ago

Describe the bug Using the itango3 client

d.get_server_list()
DB_SQLError: Failed to query TANGO database (error=Lost connection to MySQL server during query) The query was: SELECT DISTINCT server FROM device WHERE server LIKE "%" ORDER BY server (For more detailed information type: tango_error)

In [9]: d.get_server_list()
Out[9]: DbDatum(name = 'server', value_string = ['DataBaseds/2', 'processing_block_ds/1', 'sdp_master_ds/1', 'subarray_ds/1', 'TangoAccessControl/1', 'TangoTest/test'])

To Reproduce Leave the tango-db service idle for some ill-defined time!

  1. '...'
  2. '...'
  3. See error '...'

Expected behavior No error

Screenshots &/or terminal output If applicable, add screenshots or terminal output to help explain the problem.

Context Please add as much context as possible to help narrow down the problem.

bmort commented 5 years ago

If i'm right in assuming that d is a instance of the class tango.Database, this is because the mysql service container as become unavailable (at least temporarily).

I'm able to reproduce the error by removing the mysql service or killing the mysql service container (with docker kill on the container).

If using the SIP demo docker compose stack, removing the service can be done with:

docker service rm sip_tc_tango_mysql

and to kill the service (temporarily):

docker kill $(docker ps -q -f name=sip_tc_tango_mysql)

Both of these will produce the error string described in the issue.

As to why the mysql container or service is failing is harder to answer, but I will look into timeout conditions.

bmort commented 5 years ago

As i've not yet been able to reproduce this without forcing the error (as described above), if you see this again could you report the output of:

docker service ps --no-trunc <name of my tango mysql service>

where <name of my tango mysql service> == sip_tc_tango_mysql if using the SIP demo stack.

(i'm interested to see if the container is restarting when the error occurs)

bmort commented 5 years ago

Quick update.

I've tried setting the [wait_timeout](https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_wait_timeout] and interactive_timeout to the mysqld command. Unfortunately setting these to 10s ie.

command: ["mysqld", "--sql_mode=", "--wait-timeout=10", "--interactive-timeout=10"]

does not seem to reproduce the issue, so this does not look like mysql timeout error (unless there is an additional timeout setting i'm missing). Note the the default for both of these settings is 28800s == 8h