Closed Choc13 closed 1 year ago
Yes, as far as I know, this should be erasing type provider and thus the schema shouldn't be needed on runtime, only on compile time.
OK interesting. So does that mean this is a bug and that this bit of code shouldn't be being triggered at runtime? It is in a module called SqlRuntime
though which made me think it was perhaps required at runtime somehow? I'd be happy to open a PR if this should be disabled though.
The cache seems to be there already in pezipink code from 2014, so definitely it must be used for something...
Do you have any idea why it would timeout:
Yes having a look through the code more closely it does seem like the cache is needed as it uses the metadata for things like constructing the query parameters.
The timeout I see is a strange one though. I'll give you some more context. When our application starts it registers a bunch of BackgroundService
jobs. Most of which will do some fairly trivial queries to get some job metadata before starting their main body of work. All of these jobs are designed so that if they fail they will attempt to restart again between 1 and 2 minutes from now (the jitter is added to avoid any subsequent dog piling).
I only ever observe the timeout on the very first invocation of these jobs. Once they recover and restart a minute or so later they always boot up fine without incurring the timeout exception.
My first thought was that it must be because this addCache
method is therefore being called by lots of threads concurrently at startup and therefore it's exhausting connections or something. Looking at the source code for addCache
more closely I see that it uses the "lazy factory" approach inside the GetOrAdd
call for the ConcurrentDictionary
which means it should be guaranteed that all concurrent invocations are actually awaiting the result of a single invocation of the factory function. This would also explain why all of the jobs initially fail at exactly the same time when I start the application because the result of that first invocation of addCache
fails and they all share this result.
That lead me to my current hypothesis which is that because addCache
just calls con.Open()
and not OpenAsync()
that it is possibly to do with the fact that this is a blocking call? But I can't quite construct a logical argument for why that would lead to a timeout.
The addCache should definitely called only once within process and yes the con.Open() is synchronous within that locked context.
But are "All of these jobs" running within same process (executable) or are they separate instances? Because the ConcurrentDictionary is shared only within that one process.
Yes they're all running the in same process. Just a collection of BackgroundService
for which the runtime calls ExecuteAsync
on to start them. It's just strange to me that it's the opening of the connection that actually times out. It's not even got on to the part of querying the schema yet.
I think I've possibly gotten to the bottom of this. It's related to a gotcha with how BackgroundService instances are started. Also BackgroundService
instances do not wait for the application to be fully started before they can start processing their work. We are also using Quartz.NET
which also uses the BackgroundService
framework but does in fact wait for the application to be full initialised. I think a combination of these things together meant that our jobs were trying to open connections to the DB only for the runtime to then suspend execution of their threads whilst it waited for all other jobs to begin and I think this then meant that MySQL thought our application was dead and sent some timeout response.
I've fixed our jobs now so that they also wait for the IHostApplicationLifetime.ApplicationStarted
event to fire before trying to do their processing and I believe things are fine now.
Describe the bug I'm not really sure this is a bug, I'm more looking to better understand the startup / caching behaviour of the provider and looking for some advice / knowledge to help resolve a timeout issue at startup.
Recently, I've noticed that some of our background jobs that query the database immediately at startup are failing due to query timeouts. The actual SQL queries they are trying to execute are very simple and have small results sets. I've tested the SQL that is generated and it's perfectly fine (taking less than 1s to complete when I run it manually). However, the stack trace from the SQL timeout seems to show that it's due to calling the
SqlRuntime.DataContext.addCache
function.The top of the stack trace is:
What I'd like to know is what the purpose of this
addCache
function is and whether there is anything I can do to speed it up / disable it if possible. Judging by the name and looking at the source code it seems to me like it's trying to build an in memory version of the current DB schema. This is something that I thought the provider only did at design time, but I'm guessing it's also doing it at runtime to figure out the column names etc that it can fetch from each table?Another thing to point out is that I have several jobs all making similar queries concurrently at startup, so wondered if this was perhaps being made worse due to a race condition and whether or not the
providerCache
is thread-safe?Any insights into this function would be much appreciated so that I can try and figure out a strategy to avoid this timeout issue.
One thought was could we perhaps run it at compile time (maybe with an opt-in switch) so that the schema is already "built" to avoid the startup overhead of having to figure it out dynamically again?
Finally, does the
IndividualsCount
affect the performance of this at all? We don't use individuals and I wondered if maybe it was scanning the DB to build all that metadata at startup, which could be quite time consuming.Version Information SQLProvider = 1.3.8