dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.79k stars 950 forks source link

ZUNION crash on debug builds #1442

Closed kostasrim closed 1 year ago

kostasrim commented 1 year ago

Describe the bug The command ZUNION 0 key causes dragonfly to crash (but works on release).

To Reproduce

  1. Build dragonfly with in debug mode
  2. Start dragonflydb
  3. Connect with redis-cli and type ZUNION 0 key (crashes here)

Expected behavior An error for numkeys being 0

Environment (please complete the following information):

Additional context I would check all the commands with the form COMMAND NUMKEYS [Keys...]

rounaknandanwar commented 1 year ago

I tried this in debug mode.

The crash happens exactly here when it tries to access args_ which is not initialized yet.

unique_shardid = Shard(args_.front(), shard_set->size()); // TODO: Squashed bug

I can also see a TODO here mentioning a fix.

// TODO Fix this for Z family functions. // Examples that crash: ZUNION 0 myset if (name == "ZDIFF" && num_custom_keys == 0) { return OpStatus::INVALID_INT; }

Suggested fix : Add a validation inside DetermineKeys method for zunion family to explicitly return error.

if (name == "ZUNION" && num_custom_keys == 0) { return OpStatus:: SYNTAX_ERR; }

Current redis behaviour -

(error) ERR at least 1 input key is needed for zunion

Additionally,

  1. Would it make sense to have all the command related validations inside their own logic instead of in transactions?
  2. Also, I not sure how the args_ is working without being initialized at transaction.cc:256 since InitShardData gets called after that. Please correct me if I am wrong here.

Do let me know if I can take this up.

kostasrim commented 1 year ago

Hello @rounaknandanwar and thank you for replying.

The crash happens exactly here when it tries to access args_ which is not initialized yet.

That's true, that happens because we expect that there is a third argument when in reality there isn't (ZUNION 0 key_missing) and this leads to UB.

Suggested fix : Add a validation inside DetermineKeys method for zunion family to explicitly return error.

That should work, but there is an extra catch for this issue: find also the commands from the zfamily set of functions that inhibit this bug and add them there -- this should be straightforward by looking on the available commands and testing them one by one (there are not many where the numkeys exists, so it should be quite fast to test).

Would it make sense to have all the command related validations inside their own logic instead of in transactions?

Kind of. We sport a thread-per-core architecture meaning that keys/data are sharded among the available cores in the system. Before we even start a transaction we need to figure out which shard/core the command must be forwarded to which enforces some of the aforementioned validation steps to happen early. Moreover, since multiple transactions can happen simultaneously, we can re-order (based on the reads/writes on the key space affected) the execution of multiple transactions such that there are no clashes (and therefore transactions don't get rejected based on isolation guarantees).

Also, I not sure how the args_ is working without being initialized at transaction.cc:256 since InitShardData gets called after that. Please correct me if I am wrong here.

Look on line:

 248     StoreKeysInArgs(key_index, needs_reverse_mapping);

Do let me know if I can take this up.

For sure and I will review :) Just plz take a look on the other commands of the zfamily (and generally any command that has numkeys in it -- you can count them on your fingers so this won't take long).

P.s. Let me know if you have any questions

rounaknandanwar commented 1 year ago

Commands that use numkeys and are good (have checked for all commands excluding zfamily as well):

  1. zunionstore out 0 mykey
  2. zinterstore out 0 mykey
  3. eval "return ARGV[1]" 0

So its just zunion that has issues.

I do have questions but those are not related to this issue (more of related to the architecture and code structure) so will limit the discussion here :)

Do assign the issue to me and I will push the fix.

kostasrim commented 1 year ago

@rounaknandanwar It's all yours! Thank you for checking and looking forward for your PR.

For PR guidelines read here: https://github.com/dragonflydb/dragonfly/blob/main/CONTRIBUTING.md

rounaknandanwar commented 1 year ago

@kostasrim I am not able to push to a branch. Do I need to get access rights ?

kostasrim commented 1 year ago

@rounaknandanwar no you have to fork dragonfly and then push on your fork. The PR will arrive here as well :)