aerospike / aerospike-client-csharp

Aerospike C# Client Library
70 stars 48 forks source link

Why does Aerospike not follow msgpack? #103

Closed orange-puff closed 9 months ago

orange-puff commented 9 months ago

Aerospike mostly follows msgpack format, but for certain specific cases, it does not.

When writing a string to aerospike, on subsequent reads from the server, the buffer will correctly be prefiexed with the correct type and size according to msgpack, but seems to contain extra data. See https://github.com/msgpack/msgpack/blob/master/spec.md#str-format-family

Let's say we have a string whose length is 32. We would expect, based on msgpack, for the serialized version of this string to look like [0xd9, 32, data]

When reading this string from Aerospike, we will see the buffer look like [0xd9, 32, 3, data] We can see this clearly when calling the UnpackString() method in the Unpacker class.

https://github.com/aerospike/aerospike-client-csharp/blob/master/AerospikeClient/Util/Unpacker.cs#L670 -> https://github.com/aerospike/aerospike-client-csharp/blob/master/AerospikeClient/Util/Unpacker.cs#L682-L686 -> https://github.com/aerospike/aerospike-client-csharp/blob/master/AerospikeClient/Util/Unpacker.cs#L717-L721

where ParticleType.STRING == 3 according to https://github.com/aerospike/aerospike-client-csharp/blob/master/AerospikeClient/Command/ParticleType.cs#L25

Can someone explain the purpose of this, as well as if it's documented anywhere where Aerospike buffer format diverges from msgpack?

The Unpacker class claims to be following msgpack https://github.com/aerospike/aerospike-client-csharp/blob/master/AerospikeClient/Util/Unpacker.cs#L23-L27

Alb0t commented 9 months ago

https://discuss.aerospike.com/t/why-does-aerospike-not-follow-msgpack/10844/2

BrianNichols commented 9 months ago

When the client and server implemented msgpack 10+ years ago, the msgpack specification at the time was used (See https://github.com/msgpack/msgpack/blob/master/spec-old.md). This older specification did not have a string type and recommended to serialize strings using the raw (byte[]) type. Therefore, it was necessary to add an Aerospike particle type to the raw type to distinguish between a string and a byte[].

When the new msgpack specification was created later that added a string type, the string type identifier was used, but the extra particle type was left for legacy reasons. Note that new functionality (like expressions) pack expression argument strings without the particle type because that functionality is not bound by legacy code. That's why there are two low-level methods for packing strings:

        // java client code
    public void packString(String val) {
        int size = Buffer.estimateSizeUtf8(val);
        packStringBegin(size);

        if (buffer == null) {
            offset += size;
            return;
        }
        offset += Buffer.stringToUtf8(val, buffer, offset);
    }

    public void packParticleString(String val) {
        int size = Buffer.estimateSizeUtf8(val) + 1;
        packStringBegin(size);

        if (buffer == null) {
            offset += size;
            return;
        }
        buffer[offset++] = (byte)ParticleType.STRING;
        offset += Buffer.stringToUtf8(val, buffer, offset);
    }