denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
MIT License
192 stars 53 forks source link

Fix random crashes #115

Closed quinox closed 3 years ago

quinox commented 4 years ago

Hello,

Last week we upgraded a project and we were suddenly faced with random crashes while retrieving certain queries: some queries would always work, some queries would fail 30% of the time. We did not change any of our queries, so we suspected the cause to be pytds.

I managed to make it fail 0% of the time with this fix, but to be honest I'm not entirely sure what's going on. I assume that doing recv_into(buffer_of_8_bytes) will crash when it receives 10 bytes or something, whereas recv_info(buffer, 8) will just ignore the extra bytes?

I hope you will know better whether this fix makes sense :smiley:

wiebeytec commented 4 years ago

I think _read_packet() needs to use select or a similar call to wait for more data. If the connection is slow enough, it will attempt to read from the socket twice, resulting in 0 bytes the second attempt, and a forced closed connection.

quinox commented 4 years ago

Just came to report we've been using this fix in production for the last 2 weeks now and we haven't detected any negative consequences.

Danielgmelo commented 3 years ago

Hey @quinox, thanks for your suggestion. I applied this change and it worked.

denisenkom commented 3 years ago

I don't see how this change can fix an issue. Can you add stack trace for the crash?

quinox commented 3 years ago

I'm happy to help!

The full stacktrace:

Traceback (most recent call last):
  File "/mnt/drive/myproject/website/util/functions.py", line 275, in mssql_query_results
    cursor.fetchall()
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/django/db/utils.py", line 96, in inner
    return func(*args, **kwargs)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/django/db/utils.py", line 96, in inner
    return func(*args, **kwargs)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/__init__.py", line 876, in fetchall
    return list(row for row in self)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/__init__.py", line 876, in <genexpr>
    return list(row for row in self)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/__init__.py", line 879, in __next__
    row = self.fetchone()
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/__init__.py", line 852, in fetchone
    row = self._session.fetchone()
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 1584, in fetchone
    if not self.next_row():
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 1595, in next_row
    self.process_token(marker)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 1545, in process_token
    return handler(self)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 1662, in <lambda>
    tds_base.TDS_ROW_TOKEN: lambda self: self.process_row(),
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 642, in process_row
    curcol.value = self.row[i] = curcol.serializer.read(r)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds_types.py", line 692, in read
    return tds_base.readall(r, size)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds_base.py", line 564, in readall
    return join_bytearrays(read_chunks(stm, size))
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds_base.py", line 376, in join_bytearrays
    return b''.join(ba)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds_base.py", line 543, in read_chunks
    buf = stm.recv(left)
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 153, in recv
    self._read_packet()
  File "/mnt/drive/myproject/virtual/lib/python3.6/site-packages/pytds/tds.py", line 235, in _read_packet
    raise tds_base.ClosedConnectionError()
django.db.utils.InterfaceError: Server closed connection

The code that triggers it, in its most simplified way

with connections["my_mssql_db_name"].cursor() as cursor:
    cursor.execute(query)
    if cursor.description:
        raw_rows = (
            cursor.fetchall()
        )  # Fetching first to allow for raises, before inspecting .description later

I've done the testing this morning using python-tds-1.10.0: It feels like smaller result sets have less chance of running into this bug. I increased the result set until I managed to hit the bug, this query did it for me:

    WITH fakedata AS (
        SELECT
            1 AS id
            ,HashBytes('MD5', CAST(1 AS VARCHAR)) AS hash

        UNION ALL

        SELECT
            id + 1 AS id
            ,HashBytes('MD5', CAST(id AS VARCHAR)) AS hash

        FROM fakedata
        WHERE id < 10000
    ), biggerfakedata AS (
        SELECT id,
        CONCAT(hash, hash, hash, hash, hash, hash, hash, hash, hash, hash, hash, hash, hash, hash) AS hash
        FROM fakedata
    )
    SELECT
        id
        ,hash AS random_data_0
        ,hash AS random_data_1
        ,hash AS random_data_2
        ,hash AS random_data_3
        ,hash AS random_data_4
        ,hash AS random_data_5
        ,hash AS random_data_6
        ,hash AS random_data_7
        ,hash AS random_data_8
        ,hash AS random_data_9

    FROM biggerfakedata
    OPTION (maxrecursion 0)

Running it 25 times in a row without the patch results in: FFFFF.......F...F...F.... (8 fails, 17 passes) All failures report: "Server closed connection"

With the patch: ......................... (0 fails, 25 passes)

I debugged the area around the patch a bit. The tds_base.ClosedConnectionError is raised when received is 0, thus I decided to print received. Without the patch I see many screens with Received 8, with an occasional:

...
Received 8
Received 8
Received 3
Received 2
Received 0
"Server closed connection"
...
Received 8
Received 8
Received 7
Received 0
"Server closed connection"

With the patch I also see screenfuls of Received 8, with an occasional:

...
Received 8
Received 8
Received 7
Received 1
Received 1
Received 7
Received 8
Received 8
...
(no crash)
...
Received 8
Received 8
Received 6
Received 2
Received 5
Received 3
Received 8
Received 8
...

Please let me know if there's more I can do to help investigate this.