Remove array allocations

kerams commented 3 years ago

ToArray creates a new array that is the length of the memory stream contents. This is unnecessary since the data is then further copied to combine it with the total payload length. GetBuffer does not allocate and returns the backing buffer, which may be larger than the content, so it's important not to copy more bytes than what the memory stream says its length is.

2 further optimizations should be easy to do.

Use RecyclableMemoryStream instead of newing up a MemoryStream instance for every message. The package is already pulled in as a dependency of FSharp.Control.Websockets, so no extra costs.
Write 5 bytes to the memory stream first (as a reservartion of space for the length value) and then use it serialize the message. Finally, set Position to 0 and overwrite the first 5 bytes with message length. This would require an additional method in Fable.Remoting.MsgPack because the amount of bytes written per integer depends on its size, which is great in general, but here we need to rely on a fixed size to prevent array copying/shifting.

I could not test this locally but I don't expect any issues.

Shmew commented 3 years ago

Thanks! I tried my best to make things performant, but it's not something I have a lot of experience with 😅.

Use RecyclableMemoryStream instead of newing up a MemoryStream instance for every message. The package is already pulled in as a dependency of FSharp.Control.Websockets, so no extra costs.

Cool, I'll check out RecyclableMemoryStream.

Write 5 bytes to the memory stream first (as a reservartion of space for the length value) and then use it serialize the message. Finally, set Position to 0 and overwrite the first 5 bytes with message length. This would require an additional method in Fable.Remoting.MsgPack because the amount of bytes written per integer depends on its size, which is great in general, but here we need to rely on a fixed size to prevent array copying/shifting.

I'd create an issue in Fable.Remoting, although I'm not sure what details I need. If you could create one I'd appreciate it so I can add that optimization when it's implemented.

Shmew commented 3 years ago

On the topic of optimizations currently when deserializing I have this:

let reader = Read.Reader(input.ToArray())
let len = reader.Read typeof<uint32> |> unbox<uint32> |> int64

input <- input.Slice(len + 1L)

if len > MaxPayloadSize then
    message <- null
    false
else
    message <-
        reader.Read typeof<Msg<'ClientStreamFromApi,'ClientApi,'ClientApi,'ClientStreamToApi>>
        |> unbox<Msg<'ClientStreamFromApi,'ClientApi,'ClientApi,'ClientStreamToApi>>
        |> function
        ...

I imagine that input.ToArray() is quite bad here since that input could potentially be 2+ GB in size (although I would expect the vast majority to be much smaller). Do you see any issues with something like this (tests pass with the below code)?

let len = 
    input.Slice(0, 3).ToArray()
    |> Read.Reader
    |> fun lenReader -> lenReader.Read typeof<uint32> 
    |> unbox<uint32>
    |> int64

let reader = 
    let startPos =
        BitConverter.GetBytes(len)
        |> Array.filter (fun n -> n <> 0uy)
        |> Array.longLength

    input.Slice(startPos, len).ToArray()
    |> Read.Reader

input <- input.Slice(len + 1L)

if len > MaxPayloadSize then
    message <- null
    false
else
    message <-
        reader.Read typeof<Msg<'ClientStreamFromApi,'ClientApi,'ClientApi,'ClientStreamToApi>>
        |> unbox<Msg<'ClientStreamFromApi,'ClientApi,'ClientApi,'ClientStreamToApi>>
        |> function
        ...

kerams commented 3 years ago

Creating Spans by slicing is near-zero cost, but when you call ToArray on them, you'll still allocate and copy to an entirely new array, so your change pretty much does the same thing on more lines of code. input.Slice(0, 3) is a problem though, because integers of various sizes take 1-9 bytes and without inspecting the first byte (which is what the reader does) you don't know where to stop reading. That's why I said we need an extra method that will always write 5 bytes, even if the integer in question is very small. If you try to read a message larger than 65KB, you'll probably see a IndexOutOfRangeException when reading the length now.

Your original code works almost by accident. You are able to do reader.Read typeof<uint32>, because the reader only advances as far as it is necessary to get that uint32. Next, input <- input.Slice(len + 1L) has no effect on the array you passed to the reader, so the line really does nothing. Finally, you use the same reader to deserialize the actual content. This works because reading starts where it stopped last time.

I'm not entirely positive, but I think having extra data on input never causes problems (certainly not with integers as I said above), so doing let reader = Read.Reader(input.GetBuffer()) should do the trick and avoid copying. Come to think of it though, do you even need to prepend the length to each message?

Shmew commented 3 years ago

even if the integer in question is very small. If you try to read a message larger than 65KB, you'll probably see a IndexOutOfRangeException when reading the length now.

Since SignalR only allows messages up to 2GB in size we know that the length is a uint32, which is 4 bytes: [|255uy; 255uy; 255uy; 255uy|]. I'm probably misunderstanding something here, but even if it took 9 bytes wouldn't I be able to do the same thing here but by slicing 0 to 8?

Your original code works almost by accident. You are able to do reader.Read typeof, because the reader only advances as far as it is necessary to get that uint32. Finally, you use the same reader to deserialize the actual content. This works because reading starts where it stopped last time.

Yeah I was aware of that behavior, I was digging through the source code of your library when trying to get this all to work.

input <- input.Slice(len + 1L) has no effect on the array you passed to the reader, so the line really does nothing.

Sorry, I should have given some more context. input when deserializing is a byref<ReadOnlySequence<byte>> which contains at least one message (but at times multiple messages). That's why I set input once I determine the length of the message, so I can slice out the message I'm processing and later process the others available. So this change would be more to decrease the amount I have to copy to a new array.

Come to think of it though, do you even need to prepend the length to each message?

I think I may not actually need to do that anymore, I know it was a requirement when using SignalR's JS MsgPack protocol which I ended up rolling my own, but I'm not seeing a reason I need it anymore.

Creating Spans by slicing is near-zero cost, but when you call ToArray on them, you'll still allocate and copy to an entirely new array,

Is it possible to get a byte [] from a ReadOnlySequence<_> without doing that?

kerams commented 3 years ago

I'm probably misunderstanding something here, but even if it took 9 bytes wouldn't I be able to do the same thing here but by slicing 0 to 8?

But you don't know how many bytes the length integer occupies without checking the first Message Pack format byte (0-127 is encoded using a single byte, even if you're writing it as int64). If you blindly took the first 9 bytes, you could very well consume data that is part of the serialized message, corrupting the processing.

Is it possible to get a byte [] from a ReadOnlySequence<_> without doing that?

No, spans/memory are just position and length markers in some underlying sequence of data, and since you're getting a ReadOnlySequence<byte> there's very likely no way to avoid copying here without extending the Reader API.

Shmew commented 3 years ago

Oh I see what you mean. So this would always work then, right?

let len = 
    input.Slice(0, 10).ToArray()
    |> Read.Reader
    |> fun lenReader -> lenReader.Read typeof<uint32> 
    |> unbox<uint32>
    |> int64

let reader = 
    input.Slice(0L, len).ToArray()
    |> Read.Reader

reader.Read typeof<uint32> |> ignore

This is probably not worth it since in many cases it's just a single message and is almost always quite small. Anyways, I appreciate you taking the time to respond to me!

kerams commented 3 years ago

Technically yes, but it's icky and more convoluted than the original in https://github.com/Shmew/Fable.SignalR/pull/23#issuecomment-812609916. The only way of improving this without changes in Fable.Remoting.MsgPack would be using System.Buffers.ArrayPool to requisition a byte array to copy from input to. It's still unnecessary copying, but you save an allocation. It might be a bit of an overkill as you're saying, but could also be a fun simple thing to do if you're unfamiliar with the class.

You're welcome.

Shmew / Fable.SignalR

Remove array allocations #23