exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

Unsupported WebSocket opcode #49

Closed reitzig closed 4 years ago

reitzig commented 4 years ago

We're seeing low-level errors bubble up through pyexasol which we can't explain. For example:

pyexasol.exceptions.ExaQueryError: 
(
    message     =>  WebSocket: readMessage: Unsupported WebSocket opcode: 11 (Session: <snip>)
    dsn         =>  exasol:8888
    user        =>  sys
    schema      =>  
    code        =>  00000
    session_id  =>  <snip>
    query       =>  MERGE INTO <snip>
USING <snip> landing_inserts
ON ( lake_table.<snip> = landing_inserts.<snip> )
WHEN MATCHED THEN
    UPDATE SET lake_table.<snip> = landing_inserts.<snip> , ..., lake_table.<snip>=CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
    INSERT VALUES (...)
)

This is after several queries ran through successfully; it is the first MERGE, though.

Re-running the same code yielded the same error with opcode: 12. We've also seen opcode 14 once on another query, but have not been able to reproduce.

This is running queries with pyexasol 0.11.2 on Python 3.7.6 against exasol/docker-db:6.2.4-d1.

littleK0i commented 4 years ago

Ok, this is an interesting case.

1) Which version of "websocket-client" do you have? How to check:

import websocket
print(websocket.__version__)

2) Could you reproduce the same problem with debug=True and attach the log?

Thank you.

mymindwentblvnk commented 4 years ago

Hey, I work with @reitzig on this issue. Regarding your first question:

Python 3.7.6 (default, Jan  3 2020, 23:53:24) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import websocket
>>> print(websocket.__version__)
0.57.0

Second question: Should we set debug=True in pyexasol.connect()?

Regards Michael

littleK0i commented 4 years ago

One more question.

Do you use threading or multiprocessing of any kind in the main script? If so, do you share the same connection across multiple processes or threads?

There is a protection from threading problems, but multiprocessing problems after fork() might be a real issue.

littleK0i commented 4 years ago

@mymindwentblvnk , yes, debug is just an option for pyexasol.connect().

Thank you!

mymindwentblvnk commented 4 years ago

Hey @wildraid, we do not parallelize anything at the moment, but already thought about doing so. So thanks for the heads up. Will add the debug flag now. ~Where can the important logs be found afterwards?~ Nevermind. Will set debug_logdir too.

mymindwentblvnk commented 4 years ago

So here are the logs for the problematic query:

2020-02-27 09:13:32.339 [WebSocket request #396]
{
    "command": "execute",
    "sqlText": "MERGE INTO <snip>
USING <snip> landing_inserts
ON ( lake_table.<snip> = landing_inserts.<snip> )
WHEN MATCHED THEN
    UPDATE SET lake_table.<snip> = landing_inserts.<snip> , ..., lake_table.<snip>=CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
    INSERT VALUES (...)"
}
2020-02-27 09:13:35.350 [WebSocket response #396]
{
    "status": "error",
    "exception": {
        "text": "WebSocket: readMessage: Unsupported WebSocket opcode: 7 (Session: 1659680558719702237)",
        "sqlCode": "00000"
    }
}

Thanks already 👍 Michael

mymindwentblvnk commented 4 years ago

This is how we create the EXASOL connection:

conn = pyexasol.connect(dsn=exa_host + ':' + exa_port,
                            user=exa_user,
                            password=exa_password,
                            encryption=True,
                            compression=True,
                            debug=True,
                            debug_logdir='/usr/local/<snip>/log',
                            fetch_dict=True)
littleK0i commented 4 years ago

@mymindwentblvnk ,

1) Could you attach the whole log? SQL queries and response data can be removed or replaced if it's too long. There is some meta-information in requests which can be useful for investigation.

2) How long is the SQL query text which causes the problem? How long are "VALUES"? I suspect it might be reaching the maximum size of text / data frame, which causes client to send the second frame, and Exasol server might not recognise it properly.

reitzig commented 4 years ago

How long is the SQL query text which causes the problem? How long are "VALUES"?

They are quite long, merging several hundreds of there. I don't think that changed a lot recently, and it's certainly on different between container vs. real Exasol (which we use in other deployment stages). Those run an older version, though, namely 6.1.2. @mymindwentblvnk has the log at hand, he can tell you how long the strings actually are.

I suspect it might be reaching the maximum size of text / data frame, which causes client to send the second frame, and Exasol server might not recognise it properly.

If there's something we could reconfigure in that regard, we could try it out. Do you know something about that?

mymindwentblvnk commented 4 years ago

The full log. The last query causes the problem and it has a length of 12899 characters. The one before with 6815 characters works fine.

2020-02-27 09:12:02.192 Connection attempt [172.27.0.5:8888]
2020-02-27 09:12:02.215 [WebSocket request #1]
{
    "command": "login",
    "protocolVersion": 1
}
2020-02-27 09:12:02.250 [WebSocket response #1]
{
    "status": "ok",
    "responseData": {
        "publicKeyPem": "<snip>",
        "publicKeyExponent": "010001",
        "publicKeyModulus": "<snip>"
    }
}
2020-02-27 09:12:02.286 [WebSocket request #2]
{
    "username": "sys",
    "password": "<snip>",
    "driverName": "PyEXASOL 0.11.2",
    "clientName": "PyEXASOL",
    "clientVersion": "0.11.2",
    "clientOs": "Linux-4.19.76-linuxkit-x86_64-with-debian-9.12",
    "clientOsUsername": "airflow",
    "clientRuntime": "Python 3.7.6",
    "useCompression": true,
    "attributes": {
        "currentSchema": "",
        "autocommit": true,
        "queryTimeout": 0,
        "snapshotTransactionsEnabled": false
    }
}
2020-02-27 09:12:02.296 [WebSocket response #2]
{
    "status": "ok",
    "responseData": {
        "protocolVersion": 1,
        "timeZone": "EUROPE/BERLIN",
        "timeZoneBehavior": "INVALID SHIFT AMBIGUOUS ST",
        "sessionId": 1659680558719702237,
        "maxDataMessageSize": 67108864,
        "releaseVersion": "6.2.4",
        "databaseName": "DB1",
        "productName": "EXASolution",
        "maxIdentifierLength": 128,
        "maxVarcharLength": 2000000,
        "identifierQuoteString": "\""
    }
}
2020-02-27 09:12:02.296 [WebSocket request #3]
{
    "command": "getAttributes"
}
2020-02-27 09:12:02.321 [WebSocket response #3]
{
    "status": "ok",
    "attributes": {
        "datetimeFormat": "YYYY-MM-DD HH24:MI:SS.FF6",
        "dateFormat": "YYYY-MM-DD",
        "numericCharacters": ".,",
        "dateLanguage": "ENG",
        "queryTimeout": 0,
        "timezone": "EUROPE/BERLIN",
        "timezoneBehavior": "INVALID SHIFT AMBIGUOUS ST",
        "defaultLikeEscapeCharacter": "\\",
        "autocommit": true,
        "compressionEnabled": true,
        "currentSchema": "",
        "snapshotTransactionsEnabled": false,
        "openTransaction": 0
    }
}
2020-02-27 09:12:03.242 [WebSocket request #4]
{
    "command": "execute",
    "sqlText": "<snip>"
}
2020-02-27 09:12:03.289 [WebSocket response #4]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}
2020-02-27 09:12:03.290 [WebSocket request #5]
{
    "command": "execute",
    "sqlText": "COMMIT"
}
2020-02-27 09:12:03.301 [WebSocket response #5]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}
2020-02-27 09:12:03.302 [WebSocket request #6]
{
    "command": "execute",
    "sqlText": "<snip>"
}
2020-02-27 09:12:03.346 [WebSocket response #6]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}
2020-02-27 09:12:03.347 [WebSocket request #7]
{
    "command": "execute",
    "sqlText": "COMMIT"
}
2020-02-27 09:12:03.357 [WebSocket response #7]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}

// ...
// There are over hundred queries like the one above
// ...

2020-02-27 09:13:31.495 [WebSocket request #394]
{
    "command": "execute",
    "sqlText": "<ANOTHER LONG QUERY>"
}
2020-02-27 09:13:32.325 [WebSocket response #394]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}
2020-02-27 09:13:32.325 [WebSocket request #395]
{
    "command": "execute",
    "sqlText": "COMMIT"
}
2020-02-27 09:13:32.337 [WebSocket response #395]
{
    "status": "ok",
    "responseData": {
        "results": [
            {
                "resultType": "rowCount",
                "rowCount": 0
            }
        ],
        "numResults": 1
    }
}
2020-02-27 09:13:32.339 [WebSocket request #396]
{
    "command": "execute",
    "sqlText": "<THE LONG QUERY THAT CAUSES THE PROBLEM>"
}
2020-02-27 09:13:35.350 [WebSocket response #396]
{
    "status": "error",
    "exception": {
        "text": "WebSocket: readMessage: Unsupported WebSocket opcode: 7 (Session: 1659680558719702237)",
        "sqlCode": "00000"
    }
}
littleK0i commented 4 years ago

@reitzig ,

If there's something we could reconfigure in that regard, we could try it out. Do you know something about that?

If this is the case, I'll isolate the reproducible test case and raise a ticket for Exasol support.

@mymindwentblvnk , @reitzig , could you try to disable compression and repeat the test? Compression might implicitly change the frame type to "binary" instead of "text", which never caused any problems before. But maybe Exasol changed something specifically in 6.2+.

Also, if you have access to SQL process logs, the last few lines before error might be helpful. It is possible to access such logs via ExaOperation or checking logs inside docker container.

Thank you!

littleK0i commented 4 years ago

I did some tests locally and could not reproduce the problem. I guess, it's not as simple as "too long query".

Could you try to open a new session by calling pyexasol.connect() one more time before executing the problematic query? Let's see if it changes anything.

Thank you.

littleK0i commented 4 years ago

I think I've managed to reproduce it. I'll raise a ticket for Exasol support on Monday and see if we can come up with workaround.

It seems, only version 6.2.4 is affected.

mymindwentblvnk commented 4 years ago

Thank you, @wildraid.

reitzig commented 4 years ago

Reverted to 6.2.3; can confirm that this issue goes away.

Thanks for handling and pushing this, @wildraid!

littleK0i commented 4 years ago

@reitzig , @mymindwentblvnk , this issue was related to encryption=True. It will be fixed in 6.2.5: https://www.exasol.com/support/browse/EXASOL-2645

Thank you for discovering this. 👍

mymindwentblvnk commented 4 years ago

Thank you for taking care of it. We are looking forward to the new version :)

reitzig commented 4 years ago

We can indeed not reproduce the issue with Exasol 6.2.5 (and pyexasol 0.12.0) -- thanks!