Closed raffopazzo closed 2 years ago
Hey Raff! I'm great, thanks, hope you are as well.
Target types are specified by the caller when you decode, so it's not so much about the complexity or performance of that. I'm more concerned with losing type information (the benefit of BSON being that it's self describing), which may cause issues when you process the data in another context, or with runtime errors during decode (e.g., if you decode UInt32
from a Int64
or Timestamp
that is outside UInt32
range), which may occur if your data is processed and written in a context that isn't aware of your original type mapping.
What I've done in my own code is write custom bson_read
and bson_write
methods for structs where I want some fields to have custom conversions (this could be better documented). Another extension point is the representations API, also not yet documented, but you can see examples of built-in conversions here. With a custom type to represent a convertible primitive field, you could extend those methods to have that type being readable and writeable to/from BSON in any struct context.
Happy to consider the pros and cons further, but want to be careful about going down the slippery slope of trying to implement a generic serialization format.
Yeah I would have your very same concerns. It was more of an excuse to get in touch 😄 With Romain we're considering to use user defined subtypes, which you handled with BSONBinary
and UnsafeBSONBinary
I think.
That's right, yes, I've used that mechanism as well. What kind of data are you intending to serialize as binary? Extensions to the binary subtypes for typed arrays for examples have been discussed a few times, but never seemed to get enough traction with the community to make into the spec.
I'm intending to use your library to implement a julia logger that passes user message and data of a guest script to the host C++ logger. So I don't have control of what data the user might want to serialize along with their log message, thus they may contain these primitive types.
I see. At some level then you need to require every message to be BSON representable and impose some constraints on that usage I think, otherwise you're left to make a fully generalized Julia to BSON serialization, which this package won't support.
In my case it's enough that C++ and Julia agree on the meaning of custom binary subtypes, so that the values can be pretty-printed in the logs. I think I found the solution now, thanks to your link earlier. Essentially I can define my type UInt64Binary
that wraps around the actual value to represent. Tell bson_representation_{type,convert}
how to do the transformation and finally implement write_field_
to encode a binary subtype with an appropriate tag that C++ understands the meaning of. All of this is internal to Julia and C++, so I can rely on agreeing on the meaning of the different subtypes I need.
This should now be fixed in #21 (https://github.com/ancapdev/LightBSON.jl/commit/d964d326c9dafbc7fd40620d8c12adae44b77840)
You can set up a reader and writer with the new NumericBSONConversions
rules. Haven't written docs, but you can see tests for usage:
@testset "Unsigned integers" begin
buf = empty!(fill(0xff, 1000))
writer = BSONWriter(buf, NumericBSONConversions())
x1 = UInt64(0xffff_ffff_ffff_ffff)
x2 = UInt64(123)
y1 = UInt32(0xffff_ffff)
y2 = UInt32(123)
writer["x1"] = x1
writer["x2"] = x2
writer["y1"] = y1
writer["y2"] = y2
close(writer)
@test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x1"][UInt64] == x1
@test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x2"][UInt64] == x2
@test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["y1"][UInt32] == y1
@test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["y2"][UInt32] == y2
end
@testset "Float32" begin
buf = empty!(fill(0xff, 1000))
writer = BSONWriter(buf, NumericBSONConversions())
x = 1.25f0
writer["x"] = x
close(writer)
@test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x"][Float32] == x
end
You should be able to write your own conversion rules for the specifics of your context by adding a new BSONConversionRules
type. The way I set up NumericBSONConversions
was to default them to the DefaultBSONConversions
to effectively inherit those rules, with these lines:
struct NumericBSONConversions <: BSONConversionRules end
@inline function bson_representation_type(::NumericBSONConversions, ::Type{T}) where T
bson_representation_type(DefaultBSONConversions(), T)
end
@inline function bson_representation_convert(::NumericBSONConversions, ::Type{T}, x) where T
bson_representation_convert(DefaultBSONConversions(), T, x)
end
And from there add new rules.
Thanks. This looks promising. At the moment I have it working but, technically, I'm doing type-piracy. With this I can avoid the type-piracy.
Hi Christian! Hope you're doing well :) I have been playing around with your library and noticed it doesn't support some of the basic primitives. This is somewhat related to #8 and I imagine the reason for this is that they are not strictly speaking part of the bson spec. I also imagine that the issue is that during deserialization to a target type T you'd need to search for all possible source primitive types that map to the encoded type? say you can store a uint32 into a "timestamp" but then when decoding a "timestamp" you'd need to really see if the target field is uint32, uint64 or timestamp? Do you think that adding support for all "standard" primitives is out of the scope of your library?