Open trepetti opened 3 years ago
@trepetti Basically, I find my current approach to the different types to be not something I want to stabilize yet. I see two approaches:
We can clean up the Serializer
and Deserializer
types and expose them as stable.
We can add a feature flag like unstableapis
which exposes them. This would allows us to get the types public ASAP but without committing to any stability guarantee.
@npmccallum Thank you for laying out these options. For the first option, would making an API that focuses on analogy to serde_json
be a direction you would be interested in going in? I implemented a minimally modified, proof of concept public Serializer
that still keeps the encoding provided courtesy of ciborium-ll
inaccessible from the top-level Serializer
again by analogy to serde_json
. My other goal was to make sure that all the signatures in serde
trait implementations match serde
1:1 as written (even if it doesn't affect the implementation in practice): https://github.com/trepetti/ciborium/commit/1c01b99647be3ea4f112458eee537d5ff301ee06
If this isn't the direction you wanted to go in towards a stable API, I completely understand. And regardless, I think putting it initially behind a feature flag would be prudent (have not done that in this POC, though). It looks like there is no way to have the CollectionSerializer
of ciborium
as pub(crate)
, but it appears that serde_json
had this same issue with their corresponding Compound
type and just opted for #[doc(hidden)]
as well.
Let me know what you think. I would implement the corresponding Deserializer
API changes and open a draft PR based on your feedback.
erased_serde also depend on Serializer
and Deserializer
, exposing it would be useful.
In looking at the Deserializer
side of things, I definitely get why Enarx would not want to make assumptions about how/where scratch
is allocated for embedded scenarios, although it does present a challenge for implementing Deserializer::{new, from_bytes, from_str, from_reader}
in exactly the same way as serde_json
. I see a few ways to approach this:
We could maintain the flexibility in the choice of allocation strategy by modifying these functions with an extra argument for scratch
, which I think would be the cleanest option.
Alternatively, we could define
enum Scratch {
Preallocated(&'b mut [u8]),
Dynamic(Box<[u8]>),
}
and modify everywhere that references scratch
accordingly to match on the enum
.
We could convert scratch
to a Vec
as serde_json
does since de/mod.rs
is already pulling in std::alloc::Box
and std::alloc::String
anyway. We could default to a 4 kiB with_capacity
starting out to reduce runtime allocation overhead.
Let me know what you think. Since Serializer
doesn't need any additional memory beyond a few conversions to String
, we can keep it 1:1 with serde_json
as in the draft above, but for Deserializer there is that large amount of scratch memory needed, so it is worth thinking through what you'd want it to end up looking like in terms of the allocation strategy.
@trepetti Apologies for the lack of response. I was on vacation. :)
I'm fine with the first approach too. The scratch
argument definitely needs to be carefully documented.
@npmccallum Hope you had a good vacation! One last pair of questions before I put together a PR: if we expose scratch, is it also worth exposing the recursion limit, too? For either or both we could have pub const
s DEFAULT_SCRATCH_SIZE
and DEFAULT_RECURSION_LIMIT
exported, if you think that makes sense, based on the values used in de::from_reader()
. That way in addition to explaining the purposes of both arguments in the documentation, users could stick with sensible defaults without having to think about it too much if they don't want to.
@trepetti We should probably just work through these details in the PR review.
A couple of serde helper libraries also require the Serializer
/Deserializer
types to be exposed:
serde_stacker
: Serde adapter that avoids stack overflow by dynamically growing the stack.serde-transcode
: Transcode from one Serde format to another.serde_path_to_error
:
Find out the path at which a deserialization error occurred.
- We can clean up the
Serializer
andDeserializer
types and expose them as stable.
What would be required to clean them up? I did not see that yet in the thread.
- We can clean up the
Serializer
andDeserializer
types and expose them as stable.What would be required to clean them up? I did not see that yet in the thread.
They currently leverage some internal generic types which are intended to promote code sharing, but probably aren't a good public interface. These would have to be reworked to something more reasonable.
Thanks for the info. The internal generic types could still remain private and only the Serializer
and Deserializer
types made public. I imagine something like this: https://github.com/jonasbb/ciborium/commit/4324dc3791fa638ada4c6d059b42e7d98aca34e1
For the Serializer the only constraint is to also expose CollectionSerializer.
For the Deserializer the scratch space "leaks" because of the lifetime. You can abstract it away and provide a constructor for a Deserializer<'static, R>
, without requiring the user to know anything about the scratch space. I made a Scratch
struct, which abstracts over borrowed or owned scratch space. This is like the 2. option mentioned by @trepetti. The 1. option can always be added by providing new construction functions for Deserializer.
I apologize for dropping the ball on this.
I have been using this initial attempt at exposing Serializer
and have been using SerializeSeq
for streaming serialization in one of my own projects successfully, but it needed more synthetic tests done in-tree before opening it as a PR: https://github.com/trepetti/ciborium/commits/public-deserializer-and-serializer. I can take a break from some prose writing tonight to add those tests tonight and possibly break it into a separate PR for the simpler Serializer
case, if that would be better. I modified some of the naming conventions in the existing unexposed traits to make side-by-side comparison with serde and serde_json easier but can also revert those if you want a cleaner commit history.
On the deserialization side, this branch is a reasonably well fleshed-out attempt at what going with option 1 for Deserializer
might look like: forcing any scratch space to be preallocated and forcing a recursion limit to be known ahead-of-time in the constructor, which are possibly nice to be explicit about in the resource constrained use case.I expose some sensible defaults for these as constants in the top-level API as de::DEFAULT_SCRATCH_SIZE
and de::DEFAULT_RECURSION_LIMIT
. Again, I will take a stab at finishing some more tests tonight.
In going further to flesh out the equivalent of StreamDeserializer
to allow feature parity with serde_json, I ran into the fact that there would need to be some changes in ciborium-ll
in order to support the equivalent of serde_json's Deserializer::end()
. We would have to expose part of the decoder in the ciborium-ll
to report the impending end of an I/O stream. This is where it gets tricky because I am a security novice and wasn't sure how much would/could/should be exposed in a potential interface to the inner workings of ciborium-ll
or if this is even a valid concern in the first place. The larger project is sensitive about this, so I thought before moving forward with StreamDeserializer
and modifying ciborium-ll
, it would be best to submit a PR omitting Deserializer::end()
and consequently Deserializer::into_iter()
and StreamDeserializer
to flesh it out further at a later time if that is something there is also interest in: https://github.com/trepetti/ciborium/blob/dffbe59a2afc0ae67715be45b12dd52a9ce5779e/ciborium/src/de/mod.rs#L653-L659
@trepetti I'm not sure where the best place to post feedback is, so just place it here to keep the discussion in one place. Your changes already look quite complete. I didn't spot any obvious problems in them. I do have two comments:
ciborium::de::from_reader
exists without taking any scratch or recursion arguments, I think it might be worthwhile to have a ciborium::de::Deserializer::???
which also does not require those. To achieve that 2. might need to be implemented.enum Scratch
above.I am also looking into using the public Serializer
and Deserializer
for https://github.com/pickfire/babelfish which uses serde-transcode
given that serde_cbor
is now unmaintained, can't really do a switch from serde_cbor
since it does not have Deserializer::from_reader
.
erased_serde
depends on having access to the Serializer
Hi all,
I saw that Nathaniel mentioned the possibility of exposing the
Serializer
in the public API in https://github.com/pyfisch/cbor/issues/179#issuecomment-738930258. I was wondering what steps or contributions you would be looking for getting it into a state where that would be a possibility. Not sure how many other people share my use-case, but I have been usingSerializer::serialize_seq()
for outputting logs to disk (with some intervening compression) withserde_json
and was curious about moving to CBOR. I know there is more involved than just droppingpub
in front of https://github.com/enarx/ciborium/blob/06cafa7130be35b4c8b56d170d11b73b9b858ce5/ciborium/src/ser/mod.rs#L16 I can help by coming up with doc strings forrustdoc
to put it on par withserde_json
's treatment ofSerializer
if that would help.Let me know what you think and thanks for all your work on this crate.
-Tom