brianmario / mysql2

A modern, simple and very fast Mysql library for Ruby - binding to libmysql
http://github.com/brianmario/mysql2
MIT License
2.25k stars 550 forks source link

Various `Mysql2::Error`s due to race conditions with garbage collection #1258

Open burnison opened 2 years ago

burnison commented 2 years ago

Context

Several bugs in this repository, such as #1019, #694, and #1043, seem to relate to a common issue, which the work done 3 years ago in the unmerged #1048 appears to propose a potential solution. As noted in #1048, there is a race condition between the garbage collector and the user code that can lead to several different, unexpected errors in specs and production code. I am opening this issue to try associating them all with the same cause, to provide a meaningful work-around for library users, and to provide test cases for a potential fix.

Bugs

Ambiguous Mysql2::Errors

Initially, we had been experiencing behaviour similar to #1019, wherein seemingly-random rspec tests would fail with an ambiguous, Mysql2::Error with no error message. In most cases, the top of the stack trace would point to affected_rows within ActiveRecord, but could also point to any other functions that perform a query.

Eventually, this specific error was traced down to rb_mysql_client_affected_rows not expecting a return of -1 when calling mysql_affected_rows. However, the return of -1 is being set by cli_advanced_command with command 25, COM_STMT_CLOSE, which is called through a macro (stmt_command) in mysql_stmt_close by nogvl_stmt_close from rb_mysql_stmt_free during garbage collection.

This specific problem can be reproduced deterministically with the following spec:

diff --git a/spec/mysql2/statement_spec.rb b/spec/mysql2/statement_spec.rb
index dbc185e..d4cf310 100644
--- a/spec/mysql2/statement_spec.rb
+++ b/spec/mysql2/statement_spec.rb
@@ -695,6 +695,15 @@ RSpec.describe Mysql2::Statement do
     end
   end

+  context 'garbage collection' do
+    it 'should not cause rows_affected to be reset' do
+      @client.prepare 'SELECT 1'
+      @client.query 'SELECT 1'
+      GC.start(full_mark: true, immediate_sweep: true)
+      expect(@client.affected_rows).to eq 1
+    end
+  end
+
   context 'close' do
     it 'should free server resources' do
       stmt = @client.prepare 'SELECT 1'

Running this spec will produce the described error:

  1) Mysql2::Statement garbage collection should not cause rows_affected to be reset
     Failure/Error: expect(@client.affected_rows).to eq 1
     Mysql2::Error:
     # ./spec/mysql2/statement_spec.rb:715:in `affected_rows'
     # ./spec/mysql2/statement_spec.rb:715:in `block (3 levels) in <top (required)>'

Commands out of sync errors

Similar to #1043 and #694, this same race condition may trigger a CR_COMMANDS_OUT_OF_SYNC (error 2014, Commands out of sync; you can't run this command now) between the invocation of cli_read_query_result and mysql_store_result. Upon the completion of cli_read_query_result, mysql->status is set to MYSQL_STATUS_GET_RESULT, which is what mysql_store_result expects.

However, due to the the closure of the prepared statement, mysql->status is set to MYSQL_STATUS_READY in mysql_stmt_close, preventing the completion of mysql_store_result and triggering error 2014.

This one is harder to write a spec for, but can be deterministically proven with the following code change:

diff --git a/ext/mysql2/client.c b/ext/mysql2/client.c
index 5b699e9..a675cd2 100644
--- a/ext/mysql2/client.c
+++ b/ext/mysql2/client.c
@@ -609,6 +609,8 @@ static VALUE rb_mysql_client_async_result(VALUE self) {
     rb_raise_mysql2_error(wrapper);
   }

+  rb_gc();
+
   is_streaming = rb_hash_aref(rb_ivar_get(self, intern_current_query_options), sym_stream);
   if (is_streaming == Qtrue) {
     result = (MYSQL_RES *)rb_thread_call_without_gvl(nogvl_use_result, wrapper, RUBY_UBF_IO, 0);

Running the test suite will reproduce this bug:

  1) Mysql2::Statement should update a DECIMAL value passing a BigDecimal
     Failure/Error: _query(sql, @query_options.merge(options))

     Mysql2::Error:
       Commands out of sync; you can't run this command now
     # ./lib/mysql2/client.rb:131:in `_query'
     # ./lib/mysql2/client.rb:131:in `block in query'
     # ./lib/mysql2/client.rb:130:in `handle_interrupt'
     # ./lib/mysql2/client.rb:130:in `query'
     # ./spec/mysql2/statement_spec.rb:202:in `block (2 levels) in <top (required)>'

Current work-arounds

Users of mysql2 can stop this bug occurring by ensuring that all prepared statements are explicitly closed once they're done being used --- that is, never letting a prepared statement get closed by the garbage collector.

This prevents the issue due to the guard on stmt_wrapper->stmt and the subsequent nullification during the call to nogvl_stmt_close. Once a statement is closed in user code, the garbage collector does not make another call to MySQL.

tenderlove commented 2 years ago

I was reading through the code, and I can see how this would be an issue. I'm just going to dump some thoughts here, but I don't have any ideas for how to fix this yet.

  1. We can't guarantee the order that anything will get collected. I see some code in the mysql2 codebase that is trying to force things to be collected in a certain order, but it's just not possible (unless we want to muck with finalizers and I certainly don't).
  2. In-flight queries and prepared statements don't know about each other. A prepared statement could be collected at any time, including when a different query is in-flight (as this issue points out).

I only have a couple ideas for dealing with this right now.

  1. Add a queue. When a prepared statement is collected, rather than closing the statement, add it to a queue and have mysql2 drain the queue and close the statements when we know it's "safe" (right after an in-flight query has been executed, or immediately if there are no in-flight queries).
  2. Raise an exception if a statement handler is collected but hasn't been closed. IOW force the end user to manually close the statement when they are done with it.

I think the first option is a lot of work. Second option is probably less work, but forces the end user to change their code. Still trying to think if there is a better 3rd option.