dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.21k stars 908 forks source link

"eval" within "multi" crashes DF #457

Closed boazsade closed 1 year ago

boazsade commented 1 year ago

Describe the bug When running pipeline test from ioredis, we are getting stuck. This is a result of a pipeline test that execute the commands: set, eval and get. The end result is that we are not returning to the client and DF cannot be shutdown as it is stuck on a command that never completed.

To Reproduce Steps to reproduce the behavior: 1.multi

  1. set foo asdf
  2. EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second
  3. get foo
  4. exec This command would not "return", the connection will stay opened even when the connection is closed on the Redis, client side. You would not be able to shutdown DF.

Expected behavior This should return a result for the client:

1) OK
2) 1) "key1"
   2) "key2"
   3) "first"
   4) "second"
3) "asdf"

And we should be able to shutdown DF after that.

Environment (please complete the following information):

Reproducible Code Snippet

it("should be supported in transaction blocks", (done) => {
      redis
        .pipeline()
        .multi()
        .set("foo", "asdf")
        .echo("bar", "baz", "123", "abc")
        .get("foo")
        .exec()
        .exec(function (err, results) {
          expect(err).to.eql(null);
          expect(results[4][1][1]).to.eql(["bar", "baz", "123", "abc"]);
          expect(results[4][1][2]).to.eql("asdf");
          done();
        });
    });
  });

Additional context This code would work:

 it("should support callbacks", (done) => {
      let pending = 1;
      redis
        .pipeline()
        .echo("foo", "bar", "123", "abc", function (err, result) {
          pending -= 1;
          expect(err).to.eql(null);
          expect(result).to.eql(["foo", "bar", "123", "abc"]);
        })
        .exec(function (err, results) {
          expect(err).to.eql(null);
          expect(results).to.eql([[null, ["foo", "bar", "123", "abc"]]]);
          expect(pending).to.eql(0);
          done();
        });
    });
boazsade commented 1 year ago

Once this is resolved, please update the list of supported tests in ioredis

boazsade commented 1 year ago

This is triggered by "transaction.cc:420] Check failed: 0u == txid_ (0 vs. 1)" To reproduce this the only requirement is to do:

  1. MULTI
  2. EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second
  3. EXEC This would trigger the crash. When in release mode, and there are more commands in the transaction it would not crash but hung
romange commented 1 year ago

Thanks Boaz. Can you open a branch Bug457 that adds this test to dragonfly_test?

On Mon, Nov 7, 2022, 10:09 Boaz Sade @.***> wrote:

This is triggered by "transaction.cc:420] Check failed: 0u == txid_ (0 vs. 1)" To reproduce this the only requirement is to do:

  1. MULTI
  2. EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second
  3. EXEC This would trigger the crash. When in release mode, and there are more commands in the transaction it would not crash but hung

— Reply to this email directly, view it on GitHub https://github.com/dragonflydb/dragonfly/issues/457#issuecomment-1305227838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4BFCEVAN5ULW72QQI5CN3WHC2KPANCNFSM6AAAAAARVEDIBA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

boazsade commented 1 year ago

yes

boazsade commented 1 year ago

The issue is triggered by running transaction within transaction. This happens because we have:

  1. EXEC is starting a transaction (in the context of multi)
  2. Then from within this transaction we scheduling transaction operation from the script context, This calls the function "ScheduleInternal" which verifies that its the a clean transaction state (transaction id == 0). But this is not true any more, since we already pass through this function at step 1.

The fix for this would require supporting this type of flow - starting a context of transaction and then running a script that requires its own context.

For now I will try to issue a workaround that would fail this operation (would not allow for running script inside "multi"). so the result would not be either a crash or that the context is hung and the the application cannot even shutdown as it is now. I will also add a test to verify for now that we are getting an error and not other issue, and once this is fixed, we would see that this test would fail

romange commented 1 year ago

Fixed

boazsade commented 1 year ago

Just note that this is a temporary fix - we have issue #467 as a flow-up for it. There is also a branch issue-467, to work on it