ESSolutions / django-mssql-backend

Django backend for Microsoft SQL Server
https://pypi.org/project/django-mssql-backend/
BSD 3-Clause "New" or "Revised" License
147 stars 51 forks source link

Uncaught transient login error on Azure SQL Database serverless: Unable to complete login process due to delay in login response (258) #87

Open jtv8 opened 3 years ago

jtv8 commented 3 years ago

When trying to connect to an auto-paused Azure SQL Database serverless instance, the following error occurs:

   
Django Version: 3.0.11
Exception Type: OperationalError
Exception Value: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Timeout error [258]. (258) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Unable to complete login process due to delay in login response (258)')

On retrying the request (presumably once the Azure SQL Database has resumed) the issue resolves itself.

Any idea what the issue could be here? Is "Invalid connection string attribute (0);" relevant?

Here are the value of the locals at sql_server\pyodbc\base.py:314 when the error occurs:

var value
backoff_time 5
conn None
conn_params {'ATOMIC_REQUESTS': False, 'AUTOCOMMIT': True, 'CONN_MAX_AGE': 0, 'ENGINE': 'sql_server.pyodbc', 'HOST': '[REDACTED].database.windows.net', 'NAME': '[REDACTED]', 'OPTIONS': {'driver': 'ODBC Driver 17 for SQL Server'}, 'PASSWORD': '[REDACTED]', 'PORT': '', 'TEST': {'CHARSET': None, 'COLLATION': None, 'MIRROR': None, 'NAME': None}, 'TIME_ZONE': None, 'USER': '[REDACTED]@[REDACTED]'}
connstr ('DRIVER=ODBC Driver 17 for SQL ' 'Server;SERVER=[REDACTED].database.windows.net;UID=[REDACTED]@[REDACTED];PWD=[REDACTED];DATABASE=[REDACTED];MARS_Connection=yes')
cstr_parts {'DATABASE': '[REDACTED]', 'DRIVER': 'ODBC Driver 17 for SQL Server', 'MARS_Connection': 'yes', 'PWD': '[REDACTED]', 'SERVER': '[REDACTED].database.windows.net', 'UID': '[REDACTED]@[REDACTED]'}
database '[REDACTED]'
driver 'ODBC Driver 17 for SQL Server'
dsn None
error_number '49920'
host '[REDACTED].database.windows.net'
ms_drivers re.compile('^ODBC Driver .* for SQL Server$|^SQL Server Native Client')
need_to_retry False
options {'driver': 'ODBC Driver 17 for SQL Server'}
password '[REDACTED]'
port ''
query_timeout 0
retries 5
retry_count 0
self <sql_server.pyodbc.base.DatabaseWrapper object at 0x0000018572CC6208>
timeout 0
unicode_results False
user '[REDACTED]@[REDACTED]'

One possible solution: Should '258' be added to sql_server.pyodbc.base.DatabaseWrapper._transient_error_numbers?

interpeix commented 3 years ago

We are facing the same problem with Serverless database in a self-paused state. @jtv8 do you have any workaround?

jtv8 commented 3 years ago

@interpeix We didn't manage to resolve this with Serverless and so switched to using a Basic tier provisioned database instead.

interpeix commented 3 years ago

@jtv8 thank's for the answer. We really want to be able to work with serverless databases. As I can guess the trick to be able to manage that transient error should be using that retries functionality. Do you finally found what errorCode is returning? In our app is the 49920 the same as yours. For sure there will be some kind of trick to enter on the retries loop:

if options.get('extra_params', None):
            connstr += ';' + options['extra_params']

        unicode_results = options.get('unicode_results', False)
        timeout = options.get('connection_timeout', 0)
        retries = options.get('connection_retries', 5)
        backoff_time = options.get('connection_retry_backoff_time', 5)
        query_timeout = options.get('query_timeout', 0)

        conn = None
        retry_count = 0
        need_to_retry = False
        while conn is None:
            try:
                conn = Database.connect(connstr,
                                        unicode_results=unicode_results,
                                        timeout=timeout)
            except Exception as e:
                for error_number in self._transient_error_numbers:
                    if error_number in e.args[1]:
                        if error_number in e.args[1] and retry_count < retries:
                            time.sleep(backoff_time)
                            need_to_retry = True
                            retry_count = retry_count + 1
                        else:
                            need_to_retry = False
                        break
                if not need_to_retry:
                    raise

Any help would be super appreciated.

jtv8 commented 3 years ago

@interpeix My working theory is that this is not being caught correctly by the transient error handling logic because Serverless introduces new error numbers that aren't listed in sql_server.pyodbc.base.DatabaseWrapper._transient_error_numbers - see here:

https://github.com/ESSolutions/django-mssql-backend/blob/79e421ad377204580b7fb89b7e126be0c04dca75/sql_server/pyodbc/base.py#L164

I'd suggest creating a fork of django-mssql-backend, adding the error code 258 to _transient_error_numbers and seeing if that fixes it - if it does, you could create a PR (though someone with more knowledge of MSSQL error codes should check that that's an appropriate solution).

interpeix commented 3 years ago

hey @jtv8 what I can see on my debug sessions is that the problem is at e.args[1] arrray that comes from the exception doesn't have the error code, instead have this ugly piece of string '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)', Now we are trying adding this string to the transient_error_numbers entering in the retry loop, solving the issue. It's not an elegant solucion but for a quick workaround makes the job done.