Closed him3010 closed 5 years ago
@him3010 thanks for opening this issue. Do you have a simple script or code sample that reproduces the issue?
@him3010 I have been trying to reproduce the error and haven't been able to. So can you please tell me the following:
@estolfo we are using gem 'elasticsearch-persistence' 6.0.0', 'elasticsearch-transport' '6.3.1'. JRUBY Version - 9.2.5.0 with ruby version - 2.5.0 rails version - 5.1.4
In production we are using puma as a application server with the Puma Config :- Workers - 1 MinThreads - 10 MaxThreads - 60
Resolved hosts is the array of the elasticsearch nodes - [es.abc.com:9200, es1.abc.com:9200, es.abc.com:9200, ....] of size 60. Since 60 threads atmax can wait for I/O operation happening on elasticsearch
Connecting to elasticsearch client via
$elasticsearch = Elasticsearch::Client.new(hosts: resolved_hosts)
Elasticsearch::Transport::Transport::Error: Cannot get new connection from pool.
Stack trace (hide Rails)
…rt-6.3.1/lib/elasticsearch/transport/transport/
base.rb: 258:in `perform_request'
…/lib/elasticsearch/transport/transport/http/
faraday.rb: 20:in `perform_request'
…-transport-6.3.1/lib/elasticsearch/transport/
client.rb: 143:in `perform_request'
…csearch-api-6.3.1/lib/elasticsearch/api/actions/
get.rb: 49:in `get'
…6.0.0/lib/elasticsearch/persistence/repository/
find.rb: 71:in `__find_one'
…6.0.0/lib/elasticsearch/persistence/repository/
find.rb: 30:in `find'
…/1mg/jsearch/search/app/models/active_document/
base.rb: 81:in `find'
…jsearch/search/app/services/sku_module/
skus_manager.rb: 256:in `by_id'
…rch/app/controllers/api/internal/v1/
skus_controller.rb: 23:in `by_id'
…4/lib/action_controller/metal/
basic_implicit_render.rb: 4:in `send_action'
…/gems/actionpack-5.1.4/lib/abstract_controller/
base.rb: 186:in `process_action'
…ionpack-5.1.4/lib/action_controller/metal/
rendering.rb: 30:in `process_action'
…/actionpack-5.1.4/lib/abstract_controller/
callbacks.rb: 20:in `block in process_action'
…ms/activesupport-5.1.4/lib/active_support/
callbacks.rb: 108:in `block in run_callbacks'
…lib/raven/integrations/rails/
controller_transaction.rb: 7:in `block in included'
…ruby/RubyBasicObject.java:2666:in `instance_exec' (
…ms/activesupport-5.1.4/lib/active_support/
callbacks.rb: 117:in `block in run_callbacks'
…ms/activesupport-5.1.4/lib/active_support/
callbacks.rb: 135:in `run_callbacks'
…/actionpack-5.1.4/lib/abstract_controller/
callbacks.rb: 19:in `process_action'
…actionpack-5.1.4/lib/action_controller/metal/
rescue.rb: 20:in `process_action'
…k-5.1.4/lib/action_controller/metal/
instrumentation.rb: 36:in `block in process_action'
…ctivesupport-5.1.4/lib/active_support/
notifications.rb: 166:in `block in instrument'
…5.1.4/lib/active_support/notifications/
instrumenter.rb: 21:in `instrument'
…ctivesupport-5.1.4/lib/active_support/
notifications.rb: 166:in `instrument'
…k-5.1.4/lib/action_controller/metal/
instrumentation.rb: 30:in `process_action'
<truncated 28 additional frames>
…tivesupport-5.1.4/lib/active_support/
tagged_logging.rb: 26:in `tagged'
…tivesupport-5.1.4/lib/active_support/
tagged_logging.rb: 69:in `tagged'
…y-9.2.5.0/gems/railties-5.1.4/lib/rails/rack/
logger.rb: 24:in `call'
…pack-5.1.4/lib/action_dispatch/middleware/
remote_ip.rb: 79:in `call'
…ms/request_store-1.4.1/lib/request_store/
middleware.rb: 19:in `call'
…ack-5.1.4/lib/action_dispatch/middleware/
request_id.rb: 25:in `call'
…gems/jruby-9.2.5.0/gems/rack-2.0.7/lib/rack/
runtime.rb: 22:in `call'
…ctive_support/cache/strategy/
local_cache_middleware.rb: 27:in `call'
…npack-5.1.4/lib/action_dispatch/middleware/
executor.rb: 12:in `call'
…gems/sentry-raven-2.9.0/lib/raven/integrations/
rack.rb: 51:in `call'
…/jruby-9.2.5.0/gems/railties-5.1.4/lib/rails/
engine.rb: 522:in `call'
….2.5.0/gems/puma-3.12.1-java/lib/puma/
configuration.rb: 227:in `call'
…jruby-9.2.5.0/gems/puma-3.12.1-java/lib/puma/
server.rb: 660:in `handle_request'
…jruby-9.2.5.0/gems/puma-3.12.1-java/lib/puma/
server.rb: 474:in `process_client'
…jruby-9.2.5.0/gems/puma-3.12.1-java/lib/puma/
server.rb: 334:in `block in run'
…-9.2.5.0/gems/puma-3.12.1-java/lib/puma/
thread_pool.rb: 135:in `block in spawn_thread'
@him3010 thanks for opening this issue. Do you have a simple script or code sample that reproduces the issue?
Hi i have been trying to replicate this issue on staging environment but so far no luck, but whenever it is tested on production it reproduces the error. And after applying the fix no connection pool errors were seen
require 'elasticsearch'
class RoundRobinSelector
attr_reader :connections
# @option arguments [Connections::Collection] :connections Collection with connections.
#
def initialize(arguments={})
@connections = arguments[:connections]
@connection_select_mutex = Mutex.new
end
# Returns the next connection from the collection, rotating them in round-robin fashion.
#
# @return [Connections::Connection]
#
def select(options={})
@connection_select_mutex.synchronize do
@current = !defined?(@current) || @current.nil? ? 0 : @current+1
@current = 0 if @current >= connections.size
return connections[@current]
end
end
end
hosts = ["127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200", "127.0.0.1:9200"]
if ARGV[0].eql?('use_selector')
elasticsearch = Elasticsearch::Client.new(hosts: hosts, selector_class: RoundRobinSelector)
else
elasticsearch = Elasticsearch::Client.new(hosts: hosts)
end
connections = elasticsearch.transport.connections
Thread.report_on_exception = false
10.times do |x|
threads = []
20.times do |i|
threads << Thread.new do
conn = connections.get_connection || (raise Exception.new("cannot get connection from pool"))
rescue Exception
puts 'cannot get connection from pool'
end
end
threads.collect {|t| t.join}
puts "Completed #{x} round"
end
this script will reproduce the error
Steps to run this script -
First run jruby script.rb ( you will see an error cannot get connection from pool). try to run it 2-3 times to see the error. (If lucky can get error in first time)
then run jruby script.rb use_selector , no matter how many times you run this script with use_selector the script will not fail
@him3010 Thank you for the detailed info. I was able to reproduce the issue and identify the problem. I've opened a pull request, as you've seen. Would you mind taking a look at the updates I've made today? https://github.com/elastic/elasticsearch-ruby/pull/665/files Thanks!
@him3010 Thank you for the detailed info. I was able to reproduce the issue and identify the problem. I've opened a pull request, as you've seen. Would you mind taking a look at the updates I've made today? https://github.com/elastic/elasticsearch-ruby/pull/665/files Thanks!
Sure will look into the PR
Currently we shifter our search application from ruby to jruby environment, and start getting the error "Cannot get connection from the pool" even if we have provided a large number of hosts to Elasticsearch::Client object which is shared across all the threads. After digging in the code
we found the @current variable is not synchronized. So we rewrote the selector class and issue went away with it.
Fix Code