byzhang / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

The C API lacks a binary safe leveldb_get() function. #150

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
My C program that is using LevelDB for storing binary blobs and I ran into the 
following problem with the LevelDB C API:

1)Leveldb_put() allows the programmer to pass a char* and a length for both the 
"key" and "val",  which allows the programmer to store binary data,  which may 
contain null bytes.

2)leveldb_get() returns a char* without a size parameter...  Digging into 
./leveldb/c.cc you can see that it is using a "std:string" to return this 
char*...  Not only is this not binary safe,  but std::string?  Having an 
efficient slice, like the one provided in the C++ api would be much preferred.

So in short; using the C API its impossible to put() items without ever being 
able to get() them back...

The solution is to provide a leveldb_get_binary()  which allows the programmer 
to pass in a reference to char* and size param.  The call convention should be 
consistent with the leveldb_put() function.

What is the expected output? What do you see instead?
I expected to obtain an unmolested blob of binary,  I got a null terminated 
string. 

What version of the product are you using? On what operating system?
libleveldb-dev:amd64 0+20120530.gitdd0d562-2 
installed with apt,  under Ubuntu.

Original issue reported on code.google.com by firealwa...@gmail.com on 26 Feb 2013 at 9:24

GoogleCodeExporter commented 9 years ago
> 2)leveldb_get() returns a char* without a size parameter...

It does return the size in *vallen:

/* Returns NULL if not found.  A malloc()ed array otherwise.
   Stores the length of the array in *vallen. */
extern char* leveldb_get(
    leveldb_t* db,
    const leveldb_readoptions_t* options,
    const char* key, size_t keylen,
    size_t* vallen,
    char** errptr);

> Digging into ./leveldb/c.cc you can see that it is using a "std:string" to 
return this char*...  Not only is this not binary safe,

std::string is binary safe as long as you use the .data() method instead of 
.c_str() to get its contents.  And you don't use strlen() on the pointer, but 
instead use the length returned via the "vallength" out parameter.  So the 
following code should work:

   char* err = NULL;
   size_t length;
   char* ptr = leveldb_get(..., &length, &err);
   Check err...
   Use ptr[0,length-1];
   free(ptr);

If  it doesn't work, it would be great if you could point us at a stand-alone 
test to demonstrate this; or send a patch to db/c_test.cc that demonstrates the 
problem.

> but std::string?  Having an efficient slice, like the one provided in the 
C++ api would be much preferred.

A benchmark or a profile that highlights the cost of the extra copy would be 
helpful.  My guess is that it will be in the noise compared to everything else 
that leveldb does on a get call.
 

Original comment by san...@google.com on 26 Feb 2013 at 9:36

GoogleCodeExporter commented 9 years ago
Your right, however I know that leveldb_get() is not binary safe.   It looks 
like CopyString() on line 208 that is terminating on a null byte.

Why can't you just return the efficient Slice?  That would be a great design!  
Why do I have to rely on a gross std::string in C in less!  Are you expecting 
people to free() this char * in C?  I am pretty sure this is also a memory leak 
because there should be a std::string instance floating around somewhere...

Original comment by firealwa...@gmail.com on 27 Feb 2013 at 12:16

GoogleCodeExporter commented 9 years ago
> Your right, however I know that leveldb_get() is not binary safe.   It looks 
like CopyString() on line 208 that is terminating on a null byte.

CopyString looks like this:
static char* CopyString(const std::string& str) {
  char* result = reinterpret_cast<char*>(malloc(sizeof(char) * str.size()));
  memcpy(result, str.data(), sizeof(char) * str.size());
  return result;
}

Note that it carefully avoids calling .c_str() and strlen().  So it is binary 
safe.  Please send a test or test-patch if you can find a bug here.

> Why can't you just return the efficient Slice?

For the same reason the Get() operator in the C++ API does not return a Slice.  
A Slice has to point to memory owned by somebody else.  A leveldb 
implementation that returns a Slice will not be able to figure out when to free 
that backing memory without interfering with the caller's use of that result.

> That would be a great design!  Why do I have to rely on a gross std::string 
in C in less!

I don't understand your point.  The interface for leveldb_get() says explicitly 
that it returns a char* and the caller should free() it.  std::string does not 
show up in the C interface at all.  It is an implementation detail inside 
leveldb_get() that it uses a *temporary* string so it can call a string based 
C++ function.

> Are you expecting people to free() this char * in C?

Yes.  That's what c.h says (see the code fragments in my earlier message).

> I am pretty sure this is also a memory leak because there should be a 
std::string instance floating around somewhere...

There is no leak as far as I know.  Here are the lifetimes:
  (1) std::string gets allocated inside leveldb_get()
  (2) string is passed to the C++ DB::Get() method
  (3) DB::Get() stores some data in the strng
  (4) leveldb_get() copies string contents into a malloc-ed array
  (5) leveldb_get() returns: this destroys the string
  (6) caller uses the malloc-ed array
  (7) caller frees the malloc-ed array
Note that (1) matches up with (5) and (4) matches up with (7).  No leak.

Original comment by san...@google.com on 27 Feb 2013 at 1:08