exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

Setting limits on worker threads in uwsgi causes raising ExaQueryError in `export_to_pandas` call within subprocess to get stuck #79

Closed srikrbha closed 2 years ago

srikrbha commented 3 years ago

After making a custom build of py-exasol while adding changes to sys.executable (changes in repo: https://github.com/daschnerm/pyexasol/tree/master, based on discussion in issue: https://github.com/badoo/pyexasol/issues/73), we see that a subprocess spawned from the uwsgi server gets stuck while raising an ExaQueryError after calling export_to_pandas function. This problem is observed when we use the threads = x setting in uwsgi to limit the number of worker threads that get spawned by uwsgi.

Sample repro repro.tar.gz:

  1. Extract contents in relevant directory
  2. In subprocess_check.py add the necessary exasol credentials and make sure that the internal query tries to access an inexistent table in the database.
  3. Start the server using the command:
    uwsgi --ini {/path/to/uwsgi.ini} --set-placeholder PROCESSES=1 --set-placeholder THREADS=20 --set-placeholder APP_SER_SIZE=1024 --set-placeholder CWD={/path/to/app.py} --set-placeholder USER={your_user} --set-placeholder GROUP={group_relevant_to_USER} --set-placeholder LOG_PATH={/path/to/keep/uwsgi/logfile.log}
  4. Run curl call:
    curl http://0.0.0.0:9090/start
  5. In case you receive a proper response, re-run step 4 a second time (this should normally cause the request to get stuck).

In case you need exasol logs, you can build the exasol from the following branch of the repo instead for debugging: https://github.com/daschnerm/pyexasol/tree/debugging.

littleK0i commented 3 years ago

Could you try to add option --enable-threads and remove limitations to number of threads altogether? Just let Python create as many threads as it wants to.

I suspect, you might be hitting "20 threads" limit due to WebSocket client background activity.

PyEXASOL uses only 1 extra thread to monitor the execution status of SQL query. But WebSockets require a lot of threads to read / write asynchronously, check timeouts and send / receive PING / PONG frames.

srikrbha commented 3 years ago

@littleK0i The original uwsgi.ini file that I used had --enable-threads. But one question, any reason why a sub-process is facing the thread limit? Shouldn't it be out of the limits set by uwsgi? Or is it because it inherits more than just the env vars and the sockets?

Note: The subprocess was spawned using the python subprocess module with the uwsgi server process being the parent process.

littleK0i commented 3 years ago

Well, technically when you create a new process in Linux, it "forks" and child inherits everything from parent. It includes all sockets, threads, resources, memory, almost everything. So I would not be surprised if thread limits are inherited as well.

Did it help to remove the limit?

I was thinking about creating a second implementation for HTTP transport, using threads only. No subprocess, no forking.

srikrbha commented 3 years ago

Removing the thread limit did work, but, we want to keep it since in our case we have limited cpus allocated to the server spawned and we want to limit the threads spawned to prevent any contention. On that note, within the same server process, we do have other thread pools that run as well, but they have never had any such problems. Or could it be that they didn't have any problems as they aren't contending for any sockets? On that note, to rephrase the problem, what I have been observing from the logs is that the HTTP transport thread that communicates with exasol does get created, but, the thread gets stuck at exception handling and joining.

Note: Just noticed that I didn't give you any logs from one of my tries. Big mistake on my part! Here it is: PyEXASOL_20210909_035721_897429.log

littleK0i commented 3 years ago

I have a theory.

Maybe it gets stuck because it is unable to terminate subprocess due to this behaviour: https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html

Till uWSGI 2.1, by default, sending the SIGTERM signal to uWSGI means “brutally reload the stack” while the convention is to shut an application down on SIGTERM. To shutdown uWSGI use SIGINT or SIGQUIT instead. If you absolutely can not live with uWSGI being so disrespectful towards SIGTERM, by all means enable the die-on-term option. Fortunately, this bad choice has been fixed in uWSGI 2.1

PyEXASOL uses .terminate to send SIGTERM and stop the subprocess listening to data stream from EXPORT. If uwsgi overloads signal handlers for SIGTERM, it might be inherited by subprocess.

littleK0i commented 3 years ago

You may test it by replacing .terminate() call with .kill() here: https://github.com/badoo/pyexasol/blob/master/pyexasol/http_transport.py#L269

srikrbha commented 3 years ago

@littleK0i Thanks for the info. Will check and get back.

daschnerm commented 3 years ago

@littleK0i You were right with your theory, .kill() indeed works. @srikrbha tested the uWSGI flag die-on-term as well, however this did not work. Is there anything that speaks against exchanging .terminate()with kill() ?

littleK0i commented 2 years ago

It's possible to swap it with .kill(), but it's definitely not the best practice when we can terminate normally.

Could you try to reset SIGTERM signal handler to default? In order to do so, please the following code add after this line: https://github.com/badoo/pyexasol/blob/master/pyexasol_utils/http_transport.py#L236

import signal

signal.signal(signal.SIGTERM,signal.SIG_DFL)

It should reset signal handler to default, unless uWSGI added even more weirdness.

littleK0i commented 2 years ago

@daschnerm , @srikrbha

Indeed, child process inherits signals of parent process by design.

I've added reset of signal handler in 0.21.1. It should fix this specific problem.

Code: https://github.com/badoo/pyexasol/commit/7d2921456c740e755f5e96dc9a14dec8babc9713

littleK0i commented 2 years ago

@srikrbha , @daschnerm , If it is still relevant, please try to use pyexasol latest version (0.23.1+) with uWSGI.

HTTP transport was fully reworked to use threads instead of child processes. Subprocesses are no longer started. sys.executable, signal handling and many other problems should be resolved automatically.

We may see some new unforeseen problems related specifically to threads, but it is unlikely.

Thank you.

redcatbear commented 2 years ago

@srikrbha, @daschnerm: the last message from @littleK0i suggests that this ticket can be closed, right?

redcatbear commented 2 years ago

Closing ticket. No further feedback. Should be solved.