Closed kelnos closed 3 weeks ago
After digging this issue a bit, the best solution would be to implement something based on serde
directly. The following are possible ideas that need to be explored:
Have a socketioxide serializer/deserializer which would handle the replacement of binary payloads with placeholders and vice versa and defer the rest of the serde process to serde_json
.
Create a custom derive macro (like serde::{Serialize,Deserialize}
) that would handle the replacement of binary payloads when serializing the data.
For the moment the first idea seems the best (if there are no blocking points) because with the second one, it would require to have a socketioxide-proc-macros lib created.
That makes sense to me. I feel like the first idea does seem possible, and I agree it would be better. My first thought of how to handle this might be:
serde_json::Value
, and leave it as is. The packet type will let us know if it's binary, and how many attachments there are (currently it looks like the attachments count is ignored)._placeholder
with the correct num
. Replace that _placeholder
object with a Value::Array()
, filled with the binary data as a Vec<Value>
(where each element of the vec is a Value::Number
.Value
object to serde, to deserialize it into whatever the user-provided type in the handler is.But there's one potentially-large downside of this: converting a Vec<u8>
coming from the wire into a Vec<Value>
seems pretty wasteful, and for larger binary sizes it may be pretty slow to convert between the two. An alternative that I kinda like is for for socketioxide to have its own (private) wrapper enum:
enum PayloadValue {
Null,
Bool(bool),
Number(serde_json::value::Number),
String(String),
Array(Vec<PayloadValue>),
Object(serde_json::Map<String, PayloadValue>),
PendingBinary(u32), // u32 is the 'num' field
Binary(Vec<u8>), // once the binary payload is received, PendingBinary gets replaced with Binary
}
It's of course annoying to duplicate serde_json::Value
like that, but maybe it's worth it to do so. I think #[derive(Serialize, Deserialize)]
will even almost work on that struct, if we provide a custom #[serde(with = "...")]
for the PendingBinary
and Object
variants.
How does this sound to you? I'm happy to start work on it if you think it's a good approach.
Also curious as to what you think about backward compatibility. I'm not sure how tricky it will be to preserve the existing behavior for the cases that are supported (single-level array that can contain one or more binary blobs, single-level object that just holds a binary blob).
Some more regarding back-compat: I didn't realize that BinaryPacket::incoming()
was public, as are its .data
and .bin
members. With the new way of doing things, .bin
is no longer needed (or possible, really, without making data copies), and .data
is now the private PayloadValue
type.
A few options:
PayloadValue
should actually be public? Then the types of ::incoming()
and .data
change. I'm not feeling too great about this, though.PayloadValue
stays private, and:
BinaryPacket::incoming()
keeps its type signature, and converts the passed Value
to a PayloadValue
(and internally, we can use a new, private, BinaryPacket::incoming_payload_value()
to avoid an extra data type conversion). BinaryPacket::data
becomes private, but there's a new BinaryPacket::data()
method that converts the internal PayloadValue
into a serde_json::Value
.BinaryPacket::bin
becomes private, but there's a new BinaryPacket::bin()
method that returns a Vec<&Vec<u8>>
by walking the PayloadValue
and pulling out the PayloadValue::Binary
bits..data
and .bin
are retained, and are populated as best as possible, copying the data if needed. The PayloadValue
version of .data
is a new private member.I think I like the second option the best: it retains the previous functionality if crate users need it, even if they will have to update their code. The first option would require users to make bigger changes (and would expose what, to me, is an internal implementation detail). The third option would allow users to upgrade without making any changes, but would mean more work and higher memory usage per binary packet, for usage of the library that would hopefully turn into the minority usage.
Also, as an aside, I guess this means we have to figure out the outgoing binary packet story too.
There is no problem to break backward compatibility. The library is in its beta state and it is explicitely stated in the readme that API will change over the time.
Appart from that, in fact incoming
and outgoing
should be private.
Your approach with this might be really good:
enum PayloadValue {
Null,
Bool(bool),
Number(serde_json::value::Number),
String(String),
Array(Vec<PayloadValue>),
Object(serde_json::Map<String, PayloadValue>),
PendingBinary(u32), // u32 is the 'num' field
Binary(Vec<u8>), // once the binary payload is received, PendingBinary gets replaced with Binary
}
If I understand want you want to do. After getting this PayloadValue
, a custom Deserializer
implementation will convert the PayloadValue
to the user provided struct?
There is no problem to break backward compatibility. The library is in its beta state and it is explicitely stated in the readme that API will change over the time.
That makes things a bit easier :)
If I understand want you want to do. After getting this
PayloadValue
, a customDeserializer
implementation will convert thePayloadValue
to the user provided struct?
Yes, that's basically it. I played around with this a bit, and it doesn't even require much of a custom Deserializer
. It just needs a little "help" with the Object
variant.
For outgoing packets, I'm thinking of doing something similar, but in reverse. If the user provides some payload struct that looks like this:
struct SomePayload {
id: u32,
data: Vec<u8>,
}
... then I can probably "deserialize" that into:
PayloadValue::Object(HashMap{
"id" -> PayloadValue::Number(32),
"data" -> PayloadValue::PendingBinary(0),
})
And then when serializing it out into a packet to send, that'll get converted to a placeholder.
Anyhow, I'll push a PR when I have something for you to look at... sounds good?
Sure ! Don't hesitate to add benchmarks and unit tests to check how it behaves compared to the current packet deserialization.
Argh, I had a whole long message typed up here yesterday, but looks like it didn't post, and I lost it. I guess this is an opportunity to be less verbose :)
Got the PayloadValue
concept working fairly well in the context of packet.rs
. Moving outside it, I'm running into some trouble. FromMessage
/FromMessageParts
and MessageHandler
are problems. Right now they take a serde_json::Value
and a Vec<Vec<u8>>
for the non-binary and binary parts respectively. In the new way of doing things, there's no separation. So what do we pass here? serde_json::Value
no longer works: you can't embed binary data in there. (Well, you could, as a Value::Array
of Value::Number
, but then you make it essentially impossible to distinguish between binary data and something that's just an array of numbers.)
My next thought was to just get rid of the idea of an intermediate representation, and have it pass a T: Deserialize
into those trait methods. The problem there is that I'm not sure how to actually store those message handlers in Client
. Right now you're using a type-erasure trick, which works now because the Client
doesn't really need to know the type the Data
extractor is deserializing to in any given MessageHandler
. But if the handler exposes the T
directly, Client
needs to do that deserialization now, so of course it would to know what T
is, which means I don't think you can erase the type... which means I'm not really sure how you'd store the handler in that map. It's possible my imagination about how to (ab)use Rust's type system is failing me, and there is actually a way to do this, but I just don't know how. If you think this is doable and have any pointers (and think it's the right path to take), please let me know.
Another option would be to actually just go ahead and expose PayloadValue
publicly. Initially I didn't really like that idea, but I'm starting to warm to it. One thing I'd probably do is remove the BinaryPlaceholder
variant (what I used to call PendingBinary
), and just have PayloadValue::Binary(usize, Vec<u8>)
. When the full binary event hasn't yet been fully received, the vec will just be empty. That would also makes incorporating received binary attachments a little easier and less expensive, so I'm going to do that even if we don't want to make it public.
What are your thoughts here?
Now looking at ack.rs
, I see instances of serde_json::Value
there exposed publicly, like in AckInnerStream
, which according to docs needs to be public for people who implement Adapter
.
So I'm feeling more and more like making PayloadValue
public is the simplest way forward.
My next thought was to just get rid of the idea of an intermediate representation, and have it pass a T: Deserialize into those trait methods. The problem there is that I'm not sure how to actually store those message handlers in Client.
It is impossible to store a per-handler generic T, or to erase it (because of the necessity to have it to Deserialize it).
However I don't understand why you can't simply replace the serde_json::Value
that is passed to the type erased handler by your custom Value
enum? And then call from_socketio::<T>(Value)
in the data extractor (the same way than currently).
Now looking at ack.rs, I see instances of serde_json::Value there exposed publicly, like in AckInnerStream, which according to docs needs to be public for people who implement Adapter.
Indeed ack.rs is published but it is only because it is required to implement an adapter. And this use case is really rare. For me directly passing your custom Value type to the handler would break the current design with extractors (because it would be more complex for the user to manage it's incoming data).
Btw, don't hesitate to publish a draft PR so that I can check the code and that we can discuss implementation on it rather than on the issue :).
However I don't understand why you can't simply replace the
serde_json::Value
that is passed to the type erased handler by your customValue
enum?
Yes, can do that, but then that would make my custom Value
enum public, since FromMessageParts
/FromMessage
/MessageHandler
are public. If we think that's ok after all (as I'm starting to think), then there's no problem.
Btw, don't hesitate to publish a draft PR so that I can check the code and that we can discuss implementation on it rather than on the issue :).
Will do! Gonna see if I can at least get things compiling, but either way I'll push something later today.
Yes, can do that, but then that would make my custom Value enum public, since FromMessageParts/FromMessage/MessageHandler are public. If we think that's ok after all (as I'm starting to think), then there's no problem.
Yes it is not a problem, because even if it is public it is not meant to be used by the user. We can even make it #[doc(hidden)]
.
Hm one thing -- the public methods on Packet currently accept a serde_json::Value
. When I was thinking the new PayloadValue
would be private, I changed those to take a T: Serialize
instead.
Now that we're making PayloadValue
public (though "semi-hidden"), I'm realizing T: Serialize
doesn't work so well in some situations, and things would be simpler if I could just accept a PayloadValue
, in place of the current serde_json::Value
they currently take.
But that would make PayloadValue
more prominent in the API. Is that ok?
When will this issue be resolved?
Describe the bug
Related to #275, if socketioxide is fixed to support complex binary data formats, the handler API as-is becomes ambiguous and difficult to work with.
The handler presents the JSON structure as
Data
, and the binary payloads asBin
, essentially an array of payloads. Right now the API works fine in the supported cases, where the payload is a single-level array with some elements as binary, or a simple object where the root object itself is a single binary payload. That is, it's relatively unambiguous what parts of the data payload the binary bits correspond to. However, consider a more complex payload like this (currently unsupported by socketioxide, but I would like to fix that):From this structure, it might be obvious that the first payload in
Bin
is "biff", and the second is "zip", but that's only clear to someone who knows the protocol: a user of, say, the JavaScript socketio client is just passing around JSON objects with binary blobs in them. They don't know that the client is going to assign "biff" to the first payload and "zip" to the second. In fact, I'm not sure there's any requirement it do so; it might decide to do a "breadth-first" search for binary bits, and assign "zip" as the first payload and "biff" as the second.To Reproduce
(There's no reproduction, as socketio doesn't currently handle data structures like this at all.)
Expected behavior
socketioxide should support some sort of "combined" view of the data by default. For example, I should be able to define some Rust structs for the above payload like so:
And then I should be able to create a socketioxide message handler that can convert a binary packet into a
Payload
struct. I shouldn't need to deal with separateData
andBin
arguments, and try to figure out how the elements ofBin
match up with the structure of `Data.Versions (please complete the following information):
Additional context
I don't believe it's possible to implement this myself today, using
FromMessage
orFromMessageParts
, asPacket.incoming()
intentionally discards the information needed to do this.