Improving aggregated data types for performance

BridgeAR commented 5 years ago

For me the most challenging type while implementing RESP in JS was arrays. The input comes as an event with partial data without knowing when the array ends. It would be great to know the actual size of the aggregated data up front similar to bulk strings so it's possible to wait with the parsing until all necessary data has been received. That requires Redis to actually calculate the size up front but the receiver could implement it more efficiently, especially for nested aggregated types. It's off course a trade-off but I believe it could improve the overall performance for these types quite a bit (mainly in case of many entries).

I am curious about your thoughts about this @antirez.

antirez commented 5 years ago

Hello @BridgeAR, once you have the length of the array, isn't that enough to easily have a stop condition in your loop? I guess you mean you would like to know the total size in bytes of the protocol containing the array: that would be more complex in the server side indeed, and yet I cant' see how that would help in the client. Anyway reading-protocol-first parsing-later is not a good idea, and tracking the read bytes in the protocol implementation is even harder than just having a loop to process the N items. Is your problem related to implementing RESP in a non-blocking way, so that you need to take state about how much you parsed so far in order to continue?

BridgeAR commented 5 years ago

Is your problem related to implementing RESP in a non-blocking way, so that you need to take state about how much you parsed so far in order to continue

That's exactly the issue. In JS there's no way to implement this blocking, so I have to parse everything again when a new chunk of data is incoming. To prevent that I added a cache for the already parsed information and I start from the last fully parsed entry of the deepest nested array. However, handling arrays like this (they are sparse at that point) is not efficient and the state handling makes the implementation more complex.

You can look at my implementation here: https://github.com/NodeRedis/node-redis-parser/blob/master/lib/parser.js#L199-L276. https://github.com/NodeRedis/node-redis-parser/blob/master/lib/parser.js#L516-L522 is one of the entry points to get back to the correct entry to continue parsing.

tracking the read bytes in the protocol implementation is even harder than just having a loop to process the N items

It was not simple to implement that properly but without it the parser would be very slow.

reading-protocol-first parsing-later is not a good idea

That's not really what I meant, it's about being able to stop parsing until there's enough data if e.g., the bulk string length is bigger than the data I received so far. The very same principle could be used for arrays.

antirez / RESP3

Improving aggregated data types for performance #15