Extremely slow response times in each request even if it does not use remote database

tommyalvarez commented 6 years ago

Hi,

Amazing gem, i had like two database to connect to and this did save me. However, i'm having a terrible performance issue. Upon each request to the server, on random but very often ocations, i have very slow responses from the production server and i reached the conclusion by looking at the log, that most time consumed is during:

D, [2018-01-31T13:16:39.177488 #3616] DEBUG -- : [c17a4f4d-eee7-46b1-8943-458ec030b074]   SQL (183.5ms)  USE [SBDAIGAL]
D, [2018-01-31T13:16:43.057640 #3616] DEBUG -- : [c17a4f4d-eee7-46b1-8943-458ec030b074]   SQL (185.0ms)  USE [ProcustoLacroze]
D, [2018-01-31T13:16:46.947601 #3616] DEBUG -- : [c17a4f4d-eee7-46b1-8943-458ec030b074]   SQL (189.8ms)  USE [REPORTES]

(SBDAIGAL, ProcustoLacroze and REPORTES) are three extra mssql server remote databases besides the main PostgreSql database which is local. Although time shows ~180ms for each remote database connection, in practice, before request normal log starts, it takes like 5/6 seconds around these three log lines before starting to process the request. It happens even if no query or model backed by a remote database is used during the request. My suspcion was that it was something related to the query cache around filter in application controller:

around_action :query_cache_second_base

But i sincerely don't know, and do not know how to debug this, as when working in my local computer i'm in the same network as the remote databases so the performance is drastically improved. However in the production server, on random ocations as i said, a request can take 8 seconds to complete due to this reconnections.

Any idea of what's going on?

metaskills commented 6 years ago

I think I can totally help with this because I actually author the SQL Server adapter too. I wont have time to think too hard till later today on this. But quickly off the top of my head:

The USE statement bothers me because that should only happen when the connection is first made and it should only happen n times equal to your connection pool. Unless you kill those connections manually, or they are lost somehow, you should not see the USE statement which is almost certainly going to be followed by a bunch of silent schema reflection which is slow.

Can you let me know what Rails version you are using?
Are you reaping connections somehow?
Do you think you are suffering from reconnects?

tommyalvarez commented 6 years ago

@metaskills thanks for quick response:

rails (5.0.2)
How can i check that?
It may be possible, again, how should i check if there are reconnects happening?

More info: I'm also using thirdbase (a fork of this gem to connect to a third remote database, and fourthbase -> yet another fork). That's why i have three USE 'database_name' each time they appear. All of them in application_controller have an around filter:

# Caching queries 
around_action :query_cache_second_base
around_action :query_cache_third_base
around_action :query_cache_fourth_base

def query_cache_second_base
    SecondBase::Base.connection.cache { yield }
  end

  def query_cache_third_base
    ThirdBase::Base.connection.cache { yield }
  end

  def query_cache_fourth_base
    FourthBase::Base.connection.cache { yield }
  end

You think they could be affecting each other?

tommyalvarez commented 6 years ago

@metaskills don't forget this :)

metaskills commented 6 years ago

Sorry for the delay. I'm not sure what could be causing the reconnects. I feel it would be a stretch to think it was the around filters. Easy to test by removing them and watching, but I would look elsewhere. Where, it would not be easy to say without seeing your app and possible gems, but I would look there. Any call to establish_connection, any code that calls disconnect, maybe even some code that uses SecondBases' on_base code outside the rake tasks. I'm taking wild guesses but hope that helps.

metaskills commented 6 years ago

Do investigate and let me know what you find out.

tommyalvarez commented 6 years ago

@metaskills i found some SecondBase::Base.clear_active_connections! mostly in active job background processes. I found a comment above those lines that call the method saying that they are there to avoid timeout errors. I recall something about rails process fork and threads that needed to do that for secondary database connections. Do you think it could be related? My production environment runs with puma in cluster mode with more than 2 threads.

metaskills commented 6 years ago

I do think background jobs should clear the connections and be just fine. But did you say you have some connections cleared in the web process? If so, those would be the ones to address.

tommyalvarez commented 6 years ago

@metaskills nope, they were all in background jobs. Today i removed the around_filters i told you some comments above around_action :query_cache_second_base . The performance of the app sky rocket. Maybe it will be slow when querying external databases, will check that, but the problem was DEFINITILY there.

metaskills commented 6 years ago

WHOA!!! Just to be super clear, you are sayin this code caused the reconnections and perf to go down?

around_action :query_cache_second_base
around_action :query_cache_third_base
around_action :query_cache_fourth_base

def query_cache_second_base
  SecondBase::Base.connection.cache { yield }
end

def query_cache_third_base
  ThirdBase::Base.connection.cache { yield }
end

def query_cache_fourth_base
  FourthBase::Base.connection.cache { yield }
end

If so, could you try something like this?

around_action :query_cache_other_dbs

def query_cache_other_dbs
  SecondBase::Base.connection.cache do
    ThirdBase::Base.connection.cache do 
      FourthBase::Base.connection.cache { yield }
    end
  end
end

metaskills commented 6 years ago

tommyalvarez commented 6 years ago

@metaskills Sorry for the delay. The application was heavily used and i could not test it in production. I just changed the code, the performance was slow during the first requests (I assume that it was building the cache and some schema related operations) and then the response times were fine. However, i will let it idle half an hour and report back since last time it was common that the app was slow at first, then improved and after some time idle, again become very slow

tommyalvarez commented 6 years ago

@metaskills bad luck :(, after some idle time, on multiple ocassions, the requests takes 6/8 secs to complete, so no... that workaround doesn't work sadly.

metaskills commented 6 years ago

Do you have New Relic installed and have data around # and types of schema queries?
Have you tried the ActiveRecord schema in your deploy process? http://kirshatrov.com/2016/12/13/schema-cache/

tommyalvarez commented 6 years ago

@metaskills I do have new relic installed (free version). Where can i see that info ? I can't find it in overview o transactions details. No i didn't know about schema cache, i will give it a try next week!

metaskills commented 6 years ago

Do tell me how the schema cache works. For New Relic, try Googling around for "new relic slow transaction traces" and see where that leads. I use that feature a lot which has a nice details section that show different SQL counts, etc.

customink / secondbase

Extremely slow response times in each request even if it does not use remote database #55