fernandobatels / rsfbclient

Rust Firebird Client
MIT License
76 stars 10 forks source link

Thread safety support and deadlock supression #32

Closed juarezr closed 3 years ago

juarezr commented 4 years ago

While building the CI in PR #31 a deadlock error occurred randomly.

It happened either while running cargo test or with cargo tarpaulin that triggers cargo test for creating coverage.

The backtrace is:

---- tests::params::linking::ints stdout ----
Error: FbError { msg: "deadlock\nupdate conflicts with concurrent update\nconcurrent transaction number is 123", code: -913 }
thread 'tests::params::linking::ints' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/macros.rs:16:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    tests::params::linking::ints

I can easily reproduce the problem in my local computer by running:

cargo tarpaulin --ignore-config --out Lcov Xml Html --run-types AllTargets -v --line --count --branch --features 'linking date_time pool'

However, the error goes away when restricting parallel test execution to 1 thread by adding --test-threads 1 to cargo test throught cargo tarpaulin:

cargo tarpaulin --ignore-config --out Lcov Xml Html --run-types AllTargets -v --line --count --branch --features 'linking date_time pool' -- --test-threads 1

Also I saw in the Github Action log that simply using cargo test can trigger the bug:

cargo test --features dynamic_loading date_time pool

I have concluded that is a deadlock bug because:

  1. The error message says deadlock\nupdate conflicts with concurrent update
  2. The fbclient wasn't thread safe sometime ago. I don't know the current status.
  3. The error happened radomnly in anyone of test cases and changed every time command was run.

I don't know what is the resolution that should be taken in this context. Maybe:

  1. this should be fixed because just is a test case bug.
  2. this should be fixed because just is a driver bug.
  3. this should be addressed with some extra checking for handling thread safety.
  4. this should not be handled by the driver, because is a normal database concurrency issue.
  5. the test cases should run with concurrency = 1
jairinhohw commented 4 years ago

This error is occurring when a test tries to create / drop the same table at the same time as another one. I've added a atomic counter to the tests in the ~connection~ statements module, but it seems i've missed the need to add the same in the tests::params module.

This is only a problem when running the tests of multiple implementations at the same time. The dynamic_linking tests in the ci are actually running the linking tests too because the linking feature is in the default feature set. This can be fixed by adding --no-default-features to the tests while i fix the deadlock to allow testing all features at the same time.

Edit: It was the statements module, not connections

juarezr commented 4 years ago

Great!

Just some considerations:

  1. each job instance executes in a separated virtual environment provided by a GitHub-hosted runner in a virtual machine hosted by GitHub.
  2. So each job instance has it's exclusive FirebirdSQL database running in a container inside a virtual machine.
  3. However I don't know if this matters for masking the problem you explained above.
  4. I think --no-default-features may not work because --features is given to cargo test. Anyway `-- --test-threads 1 should work.

For solving this we can temporarily workaround cargo test as you suggested and revert latter after the test cases are improved.

jairinhohw commented 4 years ago

After trying to fix the tests, it seems the problem is actually a deadlock when trying to create or delete a table at the same time another thread is trying to create or delete any other table, not only the same table, so the atomic fix did not work. With this in mind i think the only way to make it work is with the -- --test-threads 1.

I had the ci set to use this before, but the cargo tarpaulin was not working because of that, but it seems to work now.

juarezr commented 4 years ago

After trying to fix the tests, it seems the problem is actually a deadlock when trying to create or delete a table at the same time another thread is trying to create or delete any other table, not only the same table, so the atomic fix did not work. With this in mind i think the only way to make it work is with the -- --test-threads 1.

FirebirdSQL restricts this kind of concurrent table creation?

I had the ci set to use this before, but the cargo tarpaulin was not working because of that, but it seems to work now.

Yes. Last run run test in sequence and it was successful.

juarezr commented 4 years ago

@jairinhohw @fernandobatels ,

I need some help to figure out how to build rsfbclient on Windows.

What I need to do for fixing this failure?

note: LINK : fatal error LNK1181: cannot open input file 'fbclient.lib'
fernandobatels commented 4 years ago

@jairinhohw @fernandobatels ,

I need some help to figure out how to build rsfbclient on Windows.

What I need to do for fixing this failure?

note: LINK : fatal error LNK1181: cannot open input file 'fbclient.lib'

I windows you need provide the fbclient.dll. You have installed all the firebird tool kit? I think this is a problem with the PATH env variable.

juarezr commented 4 years ago

I have installed FirebirdSQL v3.0.5 in a Github Windows server with the command:

choco install firebird -params '/SuperClassic /ClientAndDevTools'

Any guidance from here?

juarezr commented 4 years ago

Not worked copying fbclient.lib to build dir:

Coying C:\Program Files\Firebird\Firebird_3_0\lib\fbclient_ms.lib to D:\a\rsfbclient\rsfbclient\target\debug\deps\fbclient.lib
Coying C:\windows\system32\fbclient.dll to D:\a\rsfbclient\rsfbclient\target\debug\deps\fbclient.dll

= note: LINK : fatal error LNK1181: cannot open input file 'fbclient.lib'

https://github.com/juarezr/rsfbclient/runs/983032932?check_suite_focus=true

fernandobatels commented 3 years ago

Our CI is working now with Windows. I think we can close this issue.