arangodb / velocypack

A fast and compact format for serialization and storage
Other
420 stars 40 forks source link

Can I create a Builder from mmap memory directly? #100

Closed QthCN closed 3 years ago

QthCN commented 3 years ago

Hi velocypack team,

In my situation I want to use LMDB for velocypack data persistent, and LMDB provide data through mmap. Now I can copy the data from mmap into velocypack Builder with following codes, but it will do one-copy. Ideally I want to do zero-copy.

So can I create a Builder use mmap memory directly? Which means the Builder has a pointer to the mmap memory. Thanks.

int RcBuilder::from_raw_data(char *data, size_t size)
    {
        auto buffer = std::shared_ptr<arangodb::velocypack::Buffer<uint8_t> >(new arangodb::velocypack::Buffer<uint8_t>());
        buffer->append(data, size);
        builder = new Builder(buffer);
        rc++;
        return builder->slice().get(KEY_SIZE).getInt();
    }
jsteemann commented 3 years ago

@QthCN : a Builder has a shared_ptr to a Buffer object (_buffer member), which owns the memory. When creating a new Builder without any arguments, a new Buffer object is also created and any data that should be managed by the Builder needs to be copied into the Buffer/Builder first.

However, it you have existing VelocyPack data that is already owned by some other resource, and simply want to read from that data, you can use a Slice object.

So in your case, you could do simply do:

int RcBuilder::from_raw_data(char *data, size_t size)
    {
        return Slice(reinterpret_cast<uint8_t const*>(data)).get(KEY_SIZE).getInt();
    }

Note that Slice does not own the data it points to, and will not validate it. If you want to make sure that the data is actually valid VelocyPack and inside the range of data ... data + size, you can use a Validator to verify the data first.

Please also note that getInt() returns an int64_t, so implictly casting it into an int on function return may truncate the result.

On a side note, instead of

        auto buffer = std::shared_ptr<arangodb::velocypack::Buffer<uint8_t> >(new arangodb::velocypack::Buffer<uint8_t>());
        builder = new Builder(buffer);

you could use

        arangodb::velocypack::Buffer buffer;
        arangodb::velocypack::Builder builder(buffer);

if you only need a temporary Builder and Buffer (that are only needed in the function scope and then can be destroyed). This will avoid 3 heap allocations compared to the original code.

QthCN commented 3 years ago

Thank you @jsteemann . Your answer solved my question perfectly.