exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

Using the export_to_pandas function of pyexasol in uwsgi results in a pyexasol.exceptions.ExaQueryError #73

Closed srikrbha closed 2 years ago

srikrbha commented 3 years ago

Hello, I have found that the usage of export_to_pandas (or any export_to_* which in turn calls export_to_callback) results in an ExaQueryError when used in a uwsgi based web application. Suspecting that this could be due to the multi-threading operation happening within the function. Following is the stack-trace and repro steps:

  1. Start a sample uwsgi server using pyex_dummy.py using the following command:

    uwsgi --http :9090 --wsgi-file pyex_dummy.py
  2. Hit the newly setup endpoint with the curl call in a separate terminal window:

    curl http://0.0.0.0:9090

    pyex_dummy_no_creds.py.zip

  3. The following stacktrace is observed within the server window:

    
    Traceback (most recent call last):
    File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 313, in export_to_callback
    result = callback(http_proc.read_pipe, dst, **callback_params)
    File "/opt/conda3/lib/python3.7/site-packages/pyexasol/callback.py", line 42, in export_to_pandas
    return pandas.read_csv(pipe, skip_blank_lines=False, **kwargs)
    File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
    File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
    File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
    File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)
    File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
    pandas.errors.EmptyDataError: No columns to parse from file

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test.py", line 14, in application df = connection.export_to_pandas(query) File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 271, in export_to_pandas return self.export_to_callback(cb.export_to_pandas, None, query_or_table, query_params, callback_params, export_params) File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 335, in export_to_callback raise sql_thread.exc File "/opt/conda3/lib/python3.7/site-packages/pyexasol/http_transport.py", line 34, in run self.run_sql() File "/opt/conda3/lib/python3.7/site-packages/pyexasol/http_transport.py", line 153, in run_sql self.connection.execute("\n".join(parts)) File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 186, in execute return self.cls_statement(self, query, query_params) File "/opt/conda3/lib/python3.7/site-packages/pyexasol/statement.py", line 55, in init self._execute() File "/opt/conda3/lib/python3.7/site-packages/pyexasol/statement.py", line 159, in _execute 'sqlText': self.query, File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 572, in req raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text']) pyexasol.exceptions.ExaQueryError: ( message => ETL-5106: Following error occured while writing data to external connection [http://000.gz/ failed after 0 bytes. [Could not resolve host: 000.gz],[6],[Couldn't resolve host name]] (Session: 1698365725588979714) dsn => DEMODB.EXASOL.COM user => PUB3511 schema =>
session_id => 1698365725588979714 code => 42636 query => EXPORT ( SELECT * FROM EXA_SYSCAT ) INTO CSV AT 'http://' FILE '000.gz' WITH COLUMN NAMES )



We are running this in a container that runs a RHEL release version 7.9 OS
littleK0i commented 3 years ago

I did some checks and managed to reproduce the issue.

It looks like uwsgi takes too much liberty to redefine sys.executable path. It stores path to uwsgi there instead of path to Python interpreter. And this part no longer works properly:

        args = [sys.executable,
                '-m', 'pyexasol_utils.http_transport',
                '--host', self.host,
                '--port', str(self.port),
                '--mode', self.mode,
                '--ppid', str(os.getpid())
                ]

Relevant links: https://github.com/unbit/uwsgi/issues/670 https://bugs.python.org/issue36196

In theory, subprocess can be replaced with multiprocessing here, but it will cause problems on Windows, which is unable to "fork" properly.

Currently I don't see an easy fix for this which can maintain backwards compatibility.


However, it looks like some parameter was added to set sys.executable manually for uwsgi. https://github.com/unbit/uwsgi/commit/b6308cae818dab78da5f51eae8c903b6e2122b7a

But I am not sure if it was merged and how to set it.

Hope it helps!

tkilias commented 3 years ago

Hi @srikrbha and @wildraid,

I tried the patch from https://github.com/unbit/uwsgi/commit/b6308cae818dab78da5f51eae8c903b6e2122b7a and it seems to work.

I installed it via:

pip install https://github.com/unbit/uwsgi/archive/b6308cae818dab78da5f51eae8c903b6e2122b7a.zip

and started uwsgi then with

uwsgi --http :9090 --wsgi-file pyex_dummy.py --py-executable venv/bin/python3.6

where venv/bin/python3.6 is in my case the path to my python binary.

However, it seems this patch is not yet in the stable releases on pypi, so maybe voting for this patch in the uwsgi repository helps.

venkatrajgopal17 commented 3 years ago

Hi,

In general export_to_pandas() is not working now which was fine before.

raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
pyexasol.exceptions.ExaQueryError:
(
    message     =>  ETL-5106: Following error occured while writing data to external connection [https://000.gz/ failed after 0 bytes. [Could not resolve host: 000.gz],[6],[Cou
ldn't resolve host name]] (Session: 1698910877895153538)
    dsn         =>  xxx
    user        =>  xxx
    schema      =>
    session_id  =>  1698910877895153538
    code        =>  42636
    query       =>  EXPORT (
SELECT * FROM EXA_SYSCAT
) INTO CSV
AT 'https://' FILE '000.gz'
WITH COLUMN NAMES
)

What is the current fix for this ?

tkilias commented 3 years ago

Hi @venkatrajgopal17 ,

What do you mean with 'in general' export_to_pandas doesn't work anymore? How did you run it? Did you use the pyex_dummy.py and the patched version of uwsgi?

daschnerm commented 3 years ago

@wildraid Could we maybe use multiprocessing, together with an conditional import using os.name or platform.system() to maintain backwards compatibility with Windows ?

venkatrajgopal17 commented 3 years ago

Hi @venkatrajgopal17 ,

What do you mean with 'in general' export_to_pandas doesn't work anymore? How did you run it? Did you use the pyex_dummy.py and the patched version of uwsgi?

I have tried but the patch installation doesnt work with my venv.

*** error linking uWSGI ***
    ----------------------------------------
ERROR: Command errored out with exit status 1: '/mnt/d/Project/Pyexasol_connection/exavenv/bin/python3' -u -c 'import sys, setuptools, tokenize; sys.argv[0] =
'"'"'/tmp/pip-req-build-t3vbe8gb/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-t3vbe8gb/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace
('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-bvnapcde/install-record.txt --single-version-externally-
managed --compile --install-headers '/mnt/d/Project/Pyexasol_connection/exavenv/include/site/python3.8/uWSGI' Check the logs for full command output.

At the moment i simply used pd.DataFrame(stmt.fetchall(), columns=stmt.column_names()) to parse into a pandas df.

littleK0i commented 3 years ago

@daschnerm it is possible in theory, but it will take a few days to implement & test properly. Also, it will create two major branches of logic, with one branch not being tested at all, since Travis does not support OS Windows.

Also, the current code is re-used both for normal HTTP transport and parallel HTTP transport, which may run on multiple servers, and multiprocessing will not help.

So.. definitely possible, but at high cost with almost no reward. It may save about 100-300ms by removing the cost of starting up a new Python process, but that's it.

tkilias commented 3 years ago

Hi @wildraid,

I maybe had an idea for a small workaround. Would it be possible, that we provide an Environment Variable which can specify the python interpreter? It would be easier to test and way less invasive.

littleK0i commented 2 years ago

Closing this issue, continue in #79.