Support types that are specific to some formats

liuzicheng1987 commented 7 months ago

So far we have only supported data types that every single serialization format can support as well. However, there are certain datatypes that can only be supported by certain formats that we should support as well.

For instance:

Binary formats like CBOR, BSON, etc also support bytestrings. A common format for byte strings would be a good idea.
Some binary formats also have specific types for datetime and timestamps.
BSON also supports OID types.

liuzicheng1987 commented 7 months ago

@Lazrius , I have opened a feature branch for this issue. OID support is already implemented:

https://github.com/getml/reflect-cpp/tree/f/specific_types

There is a test for it as well:

https://github.com/getml/reflect-cpp/blob/f/specific_types/tests/bson/test_oid.cpp

liuzicheng1987 commented 7 months ago

@Lazrius , here is a list of the types supported by the BSON C API:

https://mongoc.org/libbson/current/bson_type_t.html

I think the obvious ones that we should support here are BSON_TYPE_BINARY, BSON_TYPE_DATE_TIME and BSON_TYPE_TIMESTAMP. Possibly also BSON_TYPE_REGEX.

I think in addition to the ones that we support already that should give us pretty good coverage. The way I see it, we wouldn't be supporting BSON_TYPE_EOD, BSON_TYPE_UNDEFINED, BSON_TYPE_DBPOINTER, BSON_TYPE_MAXKEY, BSON_TYPE_MINKEY...for most of these I am not even why anyone would need them.

But you are more experienced with BSON that I am...is there anything that I am missing here which you would regard to be very important?

Lazrius commented 7 months ago

@liuzicheng1987,

Off the top of my head, I think that is most of the things covered off, checking the list I don't see anything missed. After testing the BSON changes from before my code mostly works, but I do have a couple notes on the usage of bson_oid_t. When specifing a bson_oid_t _id field on an object, it will automatically get a value assigned from the underlying uint8_t bytes[12]. When piping that through the system it doesn't convert cleanly, and I endup with a document within the Mongo database that has an _id: { bytes: [...] } rather than an _id: ObjectId("hex-string").

I am not entirely sure what the proper way to handle that conversion is to be truthful, I might need to write my own parser template to handle the conversion, unless you have any ideas. Handling a default or otherwise empty value without the use of std::optional would be good though, because every Mongo document requires an _id, but if not provided it will populate one for you which I feel is the most common use case for that.

Otherwise I was able to provide a very large and complex document through the bson reader and it seemed to just handle it without issues!

liuzicheng1987 commented 7 months ago

@Lazrius...I get the problem...because OID is implemented as a struct with a byte array inside, so it treats it like that.

I have to think of a way to deal with that...

liuzicheng1987 commented 7 months ago

@Lazrius , this commit should fix the issue with the OID.

Lazrius commented 7 months ago

Just tested @liuzicheng1987, everything working fine. As my code was able to proceed to the next phase, I've noticed that one of my types which I had listed as a std::vector<byte> in my C++ struct is in fact listed as Binary.createFromBase64('lJfYU049HowiTOr4f3cKrNuP/WE=', 0) within the database - I assume this corelates to a BSON Binary data type, and the aforementioned BSON_TYPE_BINARY C type. What is the plan to specify this type within C++, and correctly distingish it between an array, because logically both are acceptable conversions from/to a std::vector<char> or std::vector<unsigned char>?

Lazrius commented 7 months ago

@liuzicheng1987 I have also had another consideration. BSON has a type of Decimal128. What internal C/C++ type would we use to represent this?

liuzicheng1987 commented 7 months ago

@Lazrius for the decimal type we can just use this one:

https://mongoc.org/libbson/current/bson_decimal128_t.html

For binary strings, I think my approach is to define a type called rfl::Bytestring which contains a std::vector and some convenience methods. Bytestrings are supported by various other formats as well (most notably CBOR) and that why I want a solution that isn't specific to any particular format.

dcorbeil commented 5 months ago

I would be interested in having support for bytestrings for cbor. Any updates on that?

liuzicheng1987 commented 5 months ago

@dcorbeil , I’ll give this higher priority

liuzicheng1987 commented 4 months ago

@dcorbeil we now officially habe support for bytestrings.

You can use rfl::Bytestring or std::basic_string. It's the same thing.

It is supported by BSON, CBOR, flexbuffers and msgpack.

dcorbeil commented 4 months ago

@liuzicheng1987 Great thank you!! I'll give that a try

getml / reflect-cpp

Support types that are specific to some formats #69