facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.28k stars 6.28k forks source link

Changes to NewObject() function signature. #9365

Open aravind-wdc opened 2 years ago

aravind-wdc commented 2 years ago

Hi,

I tried to send an email to https://groups.google.com/forum/#!forum/rocksdb, as this is more of a discussion/question but my email client could not resolve the recipient, so posting here.

Issue:

db_bench, db_stress and other apps crash when a malformed/erroneous string is passed to --fs_uri or –env_uri option.
============Example=============
./db_bench --benchmarks=fillrandom --num=10000 --fs_uri=zenfs://xyz:nvm0n
Received signal 11 (Segmentation fault)
Segmentation fault
================================

Rootcause: The factory function called by NewObject() do not allow for returning error codes from underlying filesystems and environments, it always returns a object pointer, if the pointer is null then it is returned as Status::NotSupported Since the ConfigOption.ignore_unsupported_options is true by default, underlying layers returns Status::OK instead of the error.

In case of ZenFS plugin, it identifies the error in the string, updates the errormsg and returns a null pointer. But it gets translated to Status::NotSupported("errmsg"), and since the "ignore_unsupported_options" in ConfigOptions is true by default, the error is is ignored and it gets converted to Status::OK in customizable_util.h L68, and results in a crash when the apps(db-bench) subsequently try to make the filesystem calls.

Ideal Solution: Change the signature of the factory function which is called from NewObject, so that it can return error codes (not just a null or a valid pointer) and these error codes could be propagated to upper layers. But this may break things for other users. Would such a change PR be acceptable ? So what are your thoughts on this ? Is there any other way to fix it?

Other Probable Quick Fixes and side-effects:

  1. Setting "ignore_unsupported_options" to false in config_options inside CreateFromUri(), but this is not possible as config_options is passed as a "const", so it is immutable, (so change the prototype of CreateFromUri() and deal with it in all places CreateFromUri() is called)
  2. Setting “ignore_unsupported_options = false” before calling CreateFromUri(), but that means anyone calling CreateFromUri() in future should also be aware of this, which will be unreliable. Setting the "ignore_unsupported_options" to false by default is also a no-go as it breaks a lot of other tests.
  3. Create a new ConfigOptions variable in CreateFromUri(), set “ignore_unsupported_options = false” and pass it on to CreateFromEnv(), so the error is reported back to the application and the failure is graceful and apps don't crash.

I have a patch ready for option 3.

Thanks Aravind

mrambacher commented 2 years ago

This is very similar to #9299. With the change for #9333, the code will no longer return NotSupported if there is a factory but instead return InvalidArgument. This will fix the errors for "ignore_unsupported_options=false" reported by this issue, but not if "ignore_unknown_options=true" (which still will fail).

For the places that call CreateFromUri in the code currently (db_bench_tool, ldb_cmd, sst_dump_tool, testutil), it probably makes sense for them to set ignore_unsupported_options=false in general, making option 3 a reasonable change. I do not know what happens if those calls also set ignore_unknown_options=false, but that might be a reasonable thing to do as well.

Regardless, there should not be a crash. I will attempt to reproduce the crash and see what I can figure out.

aravind-wdc commented 2 years ago

Thanks @mrambacher. Looks like #9333 will solve this issue. If it is going to be pulled in shortly, I think we are ok. Else, I can send a PR based on option 3, as discussed above.