ancapdev / LightBSON.jl

High performance encoding and decoding of BSON data in Julia
MIT License
20 stars 4 forks source link

UInt32, UInt64 and Float32 #13

Closed raffopazzo closed 2 years ago

raffopazzo commented 2 years ago

Hi Christian! Hope you're doing well :) I have been playing around with your library and noticed it doesn't support some of the basic primitives. This is somewhat related to #8 and I imagine the reason for this is that they are not strictly speaking part of the bson spec. I also imagine that the issue is that during deserialization to a target type T you'd need to search for all possible source primitive types that map to the encoded type? say you can store a uint32 into a "timestamp" but then when decoding a "timestamp" you'd need to really see if the target field is uint32, uint64 or timestamp? Do you think that adding support for all "standard" primitives is out of the scope of your library?

ancapdev commented 2 years ago

Hey Raff! I'm great, thanks, hope you are as well.

Target types are specified by the caller when you decode, so it's not so much about the complexity or performance of that. I'm more concerned with losing type information (the benefit of BSON being that it's self describing), which may cause issues when you process the data in another context, or with runtime errors during decode (e.g., if you decode UInt32 from a Int64 or Timestamp that is outside UInt32 range), which may occur if your data is processed and written in a context that isn't aware of your original type mapping.

What I've done in my own code is write custom bson_read and bson_write methods for structs where I want some fields to have custom conversions (this could be better documented). Another extension point is the representations API, also not yet documented, but you can see examples of built-in conversions here. With a custom type to represent a convertible primitive field, you could extend those methods to have that type being readable and writeable to/from BSON in any struct context.

Happy to consider the pros and cons further, but want to be careful about going down the slippery slope of trying to implement a generic serialization format.

raffopazzo commented 2 years ago

Yeah I would have your very same concerns. It was more of an excuse to get in touch 😄 With Romain we're considering to use user defined subtypes, which you handled with BSONBinary and UnsafeBSONBinary I think.

ancapdev commented 2 years ago

That's right, yes, I've used that mechanism as well. What kind of data are you intending to serialize as binary? Extensions to the binary subtypes for typed arrays for examples have been discussed a few times, but never seemed to get enough traction with the community to make into the spec.

raffopazzo commented 2 years ago

I'm intending to use your library to implement a julia logger that passes user message and data of a guest script to the host C++ logger. So I don't have control of what data the user might want to serialize along with their log message, thus they may contain these primitive types.

ancapdev commented 2 years ago

I see. At some level then you need to require every message to be BSON representable and impose some constraints on that usage I think, otherwise you're left to make a fully generalized Julia to BSON serialization, which this package won't support.

raffopazzo commented 2 years ago

In my case it's enough that C++ and Julia agree on the meaning of custom binary subtypes, so that the values can be pretty-printed in the logs. I think I found the solution now, thanks to your link earlier. Essentially I can define my type UInt64Binary that wraps around the actual value to represent. Tell bson_representation_{type,convert} how to do the transformation and finally implement write_field_ to encode a binary subtype with an appropriate tag that C++ understands the meaning of. All of this is internal to Julia and C++, so I can rely on agreeing on the meaning of the different subtypes I need.

ancapdev commented 2 years ago

This should now be fixed in #21 (https://github.com/ancapdev/LightBSON.jl/commit/d964d326c9dafbc7fd40620d8c12adae44b77840)

You can set up a reader and writer with the new NumericBSONConversions rules. Haven't written docs, but you can see tests for usage:

@testset "Unsigned integers" begin
    buf = empty!(fill(0xff, 1000))
    writer = BSONWriter(buf, NumericBSONConversions())
    x1 = UInt64(0xffff_ffff_ffff_ffff)
    x2 = UInt64(123)
    y1 = UInt32(0xffff_ffff)
    y2 = UInt32(123)
    writer["x1"] = x1
    writer["x2"] = x2
    writer["y1"] = y1
    writer["y2"] = y2
    close(writer)
    @test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x1"][UInt64] == x1
    @test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x2"][UInt64] == x2
    @test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["y1"][UInt32] == y1
    @test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["y2"][UInt32] == y2
end

@testset "Float32" begin
    buf = empty!(fill(0xff, 1000))
    writer = BSONWriter(buf, NumericBSONConversions())
    x = 1.25f0
    writer["x"] = x
    close(writer)
    @test BSONReader(buf, StrictBSONValidator(), NumericBSONConversions())["x"][Float32] == x
end
ancapdev commented 2 years ago

You should be able to write your own conversion rules for the specifics of your context by adding a new BSONConversionRules type. The way I set up NumericBSONConversions was to default them to the DefaultBSONConversions to effectively inherit those rules, with these lines:

struct NumericBSONConversions <: BSONConversionRules end

@inline function bson_representation_type(::NumericBSONConversions, ::Type{T}) where T
    bson_representation_type(DefaultBSONConversions(), T)
end

@inline function bson_representation_convert(::NumericBSONConversions, ::Type{T}, x) where T
    bson_representation_convert(DefaultBSONConversions(), T, x)
end

And from there add new rules.

raffopazzo commented 1 year ago

Thanks. This looks promising. At the moment I have it working but, technically, I'm doing type-piracy. With this I can avoid the type-piracy.