NoBrainerORM / nobrainer

Ruby ORM for RethinkDB
http://nobrainer.io/
Other
387 stars 49 forks source link

intermittent nobrainer issue in ci #170

Closed modosc closed 9 years ago

modosc commented 9 years ago

we see this error sometimes in our build system:

 /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/write_error.rb:23:in `raise_write_error': Cannot perform write: primary replica for shard ["", +inf) not available (NoBrainer::Error::DocumentNotPersisted)
Query was: r.table("nobrainer_locks").get("u704GX8GtnFAOle50XbqA5cyPlk=").replace {|var_1|
  r.branch(
    var_1.eq(nil).or(var_1[:expires_at].lt(r.now)),
    {"key" =>
       "nobrainer:sync_table_config",
     "key_hash" =>
       "u704GX8GtnFAOle50XbqA5cyPlk=",
     "instance_token" =>
       "2XSUM0W28eUWaR",
     "expires_at" =>
       r.now.add(60)},
    var_1
  )
}
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/write_error.rb:15:in `rescue in call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/write_error.rb:3:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/profiler.rb:4:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/table_on_demand.rb:3:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/database_on_demand.rb:3:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/missing_index.rb:3:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner/run_options.rb:53:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/middleware-0.1.0/lib/middleware/runner.rb:31:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/middleware-0.1.0/lib/middleware/builder.rb:102:in `call'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/query_runner.rb:21:in `run'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/nobrainer.rb:26:in `run'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/lock.rb:68:in `try_lock'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/lock.rb:55:in `block in lock'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/lock.rb:54:in `loop'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/lock.rb:54:in `lock'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/lock.rb:40:in `synchronize'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/document/table_config/synchronizer.rb:10:in `sync_table_config'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/document/table_config.rb:98:in `sync_table_config'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/no_brainer/document/table_config.rb:108:in `sync_schema'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/nobrainer-0.30.0/lib/nobrainer.rb:31:in `sync_schema'
  from  /home/ubuntu/repo/db/seeds.rb:43:in `<top (required)>'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/dependencies.rb:268:in `load'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/dependencies.rb:268:in `block in load'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/dependencies.rb:240:in `load_dependency'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.4/lib/active_support/dependencies.rb:268:in `load'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/gems/railties-4.2.4/lib/rails/engine.rb:547:in `load_seed'
  from  /home/ubuntu/repo/spec/rails_helper.rb:173:in `block (2 levels) in <top (required)>'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/example.rb:378:in `instance_exec'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/example.rb:378:in `instance_exec'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/hooks.rb:357:in `run'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/configuration.rb:1660:in `block in run_hooks_with'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/configuration.rb:1660:in `each'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/configuration.rb:1660:in `run_hooks_with'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/configuration.rb:1626:in `with_suite_hooks'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/runner.rb:114:in `block in run_specs'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/reporter.rb:77:in `report'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/runner.rb:113:in `run_specs'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/runner.rb:89:in `run'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/runner.rb:73:in `run'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/lib/rspec/core/runner.rb:41:in `invoke'
  from /home/ubuntu/.rvm/gems/ruby-2.2.3@global/gems/rspec-core-3.3.2/exe/rspec:4:in `<top (required)>'
  from  /home/ubuntu/repo/vendor/bundle/ruby/2.2.0/bin/rspec:23:in `load'

curious what the underlying cause would be? this is intermittent, should we be waiting on something from rethink at the beginning of spinning up a test container before we try to access it? there isn't anything useful in the rethinkdb logs, at least as far as i can tell (maybe i don't have the right logging enabled?).

nviennot commented 9 years ago

I've remove locking around the table_config sync: https://github.com/nviennot/nobrainer/commit/61d0f6dbd21ec4c2abae5fb730c87dab4c02d5a0 You should no longer see intermittent failures. if you do so, please reach back again.

Cause is that it takes some time for tables to be ready for writes. nobrainer was trying to use a table without having waited on it after creation.

modosc commented 9 years ago

so we're using this version of the code and i just saw this error:

/home/ubuntu/repo/vendor/bundle/ruby/2.2.0/bundler/gems/nobrainer-61d0f6dbd21e/lib/no_brainer/query_runner/write_error.rb:23:in `raise_write_error': The server(s) hosting table `repo_test.nobrainer_index_meta` are currently unreachable. The table's configuration was not changed. (NoBrainer::Error::DocumentNotPersisted)
Query was: r.table("nobrainer_index_meta").config.update({"write_acks" => "majority"})

sorry to hitch onto this issue, maybe this is a separate problem? here's the log entries preceding this:

Connected to rethinkdb://localhost:28015/repo_test
RethinkDB (0.7ms) MANAGEMENT r.db_create("repo_test")  
RethinkDB (2.12s) MANAGEMENT r.table_create( "jobs", {"durability" => "hard", "shards" => 1, "replicas" => 1, "primary_key" => "id"})
RethinkDB (1.1ms) WRITE r.table("jobs").config.update({"write_acks" => "majority"})  
RethinkDB (0.5ms) MANAGEMENT r.table("jobs").config
RethinkDB (1.1ms) MANAGEMENT r.table("jobs").reconfigure({"shards" => 1, "replicas" => 1})  
RethinkDB (10.03s) MANAGEMENT r.table_create( "nobrainer_index_meta", {"durability" => "hard", "shards" => 1, "replicas" => 1, "primary_key" => "id"}) 
RethinkDB (9.65s) WRITE r.table("nobrainer_index_meta").config.update({"write_acks" => "majority"}) NoBrainer::Error::DocumentNotPersisted The server(s) hosting table `repo_test.nobrainer_index_meta` are currently unreachable. The table's configuration was not changed. 
nviennot commented 9 years ago

where do these log entries come from?

modosc commented 9 years ago

i wrote a custom logger for nobrainer that copies the activerecord logger look/feel (i have the same thing for memcache and redis).

nviennot commented 9 years ago

This is suspicious: (10.03s) MANAGEMENT r.table_create( "nobrainer_index_meta") 10 seconds feels like something timed out. Not sure why it succeeded, but something is funny with the server.

modosc commented 9 years ago

here are the actual rethink logs from the failing instance:

2015-10-16T22:48:34.344661438 0.007748s notice: Recursively removing directory /var/lib/rethinkdb/instance1/data/tmp
2015-10-16T22:48:34.364684528 0.027771s notice: Initializing directory /var/lib/rethinkdb/instance1/data
2015-10-16T22:48:34.364797108 0.027883s info: Creating a default database for your convenience. (This is because you ran 'rethinkdb' without 'create', 'serve', or '--join', and the directory '/var/lib/rethinkdb/instance1/data' did not already exist or is empty.)
2015-10-16T22:48:34.364832314 0.027918s notice: Running rethinkdb 2.1.5+2~0precise (GCC 4.6.3)...
2015-10-16T22:48:34.367881637 0.030970s notice: Running on Linux 3.14.28-031428-generic x86_64
2015-10-16T22:48:34.367918750 0.031005s notice: Loading data from directory /var/lib/rethinkdb/instance1/data
2015-10-16T22:48:34.397859131 0.060946s info: Automatically using cache size of 113559 MB
2015-10-16T22:48:34.399948397 0.063035s notice: Listening for intracluster connections on port 29015
2015-10-16T22:48:34.401732665 0.064819s notice: Listening for client driver connections on port 28015
2015-10-16T22:48:34.401845926 0.064932s notice: Listening for administrative HTTP connections on port 8080
2015-10-16T22:48:34.401850464 0.064936s notice: Listening on address: 127.0.0.1
2015-10-16T22:48:34.401850942 0.064937s notice: To fully expose RethinkDB on the network, bind to all addresses by adding `bind=all' to the config file (/etc/rethinkdb/instances.d/instance1.conf).
2015-10-16T22:48:34.401852388 0.064938s notice: Server ready, "box240_pyx" 546fac77-d9b5-47c0-935a-bb0fc38e5e61
2015-10-16T22:48:54.445794241 20.108881s info: Table b54de279-1a48-41a1-9c11-0124ee15011f: Added replica on this server.
2015-10-16T22:48:55.803908684 21.466995s info: Table b54de279-1a48-41a1-9c11-0124ee15011f: Starting a new Raft election for term 1.
2015-10-16T22:48:55.805538520 21.468625s info: Table b54de279-1a48-41a1-9c11-0124ee15011f: This server is Raft leader for term 1. Latest log index is 0.
2015-10-16T22:48:56.528167932 22.191254s info: Table b54de279-1a48-41a1-9c11-0124ee15011f: Configuration is changing.
2015-10-16T22:48:56.531226754 22.194313s info: Table b54de279-1a48-41a1-9c11-0124ee15011f: Configuration is changing.
2015-10-16T22:48:56.745424693 22.408511s info: Table 1bdc7b3f-ceb8-48f5-876a-7665b8d8ab93: Added replica on this server.
2015-10-16T22:48:58.448667275 24.111754s info: Table 1bdc7b3f-ceb8-48f5-876a-7665b8d8ab93: Starting a new Raft election for term 1.
2015-10-16T22:49:30.433658544 56.096745s info: Table 1bdc7b3f-ceb8-48f5-876a-7665b8d8ab93: This server is Raft leader for term 1. Latest log index is 0.

on our passing instances we get basically the same logs but they continue on:

2015-10-16T22:49:33.660215857 59.318106s info: Table ccf13a97-ea42-4a0a-bb9b-69518a04a967: Configuration is changing.
2015-10-16T22:49:33.663105826 59.320996s info: Table ccf13a97-ea42-4a0a-bb9b-69518a04a967: Configuration is changing.
2015-10-16T22:49:33.888906778 59.546798s info: Table 0d29653f-2a2d-4bd5-a770-1481f1d3e532: Added replica on this server.
2015-10-16T22:49:35.222377905 60.880269s info: Table 0d29653f-2a2d-4bd5-a770-1481f1d3e532: Starting a new Raft election for term 1.
2015-10-16T22:49:35.224147199 60.882038s info: Table 0d29653f-2a2d-4bd5-a770-1481f1d3e532: This server is Raft leader for term 1. Latest log index is 0.
2015-10-16T22:49:35.971305783 61.629196s info: Table 0d29653f-2a2d-4bd5-a770-1481f1d3e532: Configuration is changing.
2015-10-16T22:49:35.974531433 61.632422s info: Table 0d29653f-2a2d-4bd5-a770-1481f1d3e532: Configuration is changing.

anyway, appreciate your help - i guess i should go over to the rethinkdb issues with this.