Closed dishmaker closed 2 months ago
This is an extremely immaturely phrased issue. Please act like an adult when reporting issues to this project. Knock off the caps lock.
My .exe file is too large (12 MB for a simple DER encoder/decoder)
What .exe
? I can't help you without seeing the code.
Please post a complete example that illustrates the issue.
I'll also note that on Linux/Mac, the binaries produced by the test cases in this repo are around 1MB-1.5MB, even with debug symbols. This includes particularly complicated cases like x509-cert
.
I have no idea what's bloating your particular binary to 12MB.
Thanks for responding!
This is an extremely immaturely phrased issue.
Ok, I've rephrased the issue.
The problem is simple:
NestedReader<NestedReader<NestedReader<SliceReader>>>
should become at most:
NestedReader<SliceReader>
and never nest itself deeper. I know it might be hard to solve without dyn Reader
.
My main app lists every possible sequence in an enum for ease of use, GUI and encoding/decoding from and to JSON.
But let's use your example: x509-cert
#[derive(Choice)]
pub enum AnyX509Sequence {
Certificate(Certificate),
TbsCertificateInner(TbsCertificateInner),
PkiPath(PkiPath),
CertReqInfo(CertReqInfo),
CertReq(CertReq),
TrustAnchorInfo(TrustAnchorInfo),
CertPathControls(CertPathControls),
// and so on...
}
Therefore, every single sub-sequence gets the DecodeValue
trait implemented for:
NestedReader<NestedReader<NestedReader<SliceReader>>>
NestedReader<NestedReader<SliceReader>>
NestedReader<SliceReader>
SliceReader
Another huge problem with nesting is error propagation with offsets.
.at(inner.offset()))
Is .nested()
in a right place?
https://github.com/RustCrypto/formats/blob/086ecd7d27aa3f6c5c97dc2a33adc7159008086b/der/src/reader.rs#L59
It should be called in .read_nested()
, since that's where nesting gets done.
I think it doesn't work since sometimes i'm seeing twice or even 3x larger index than input binary file itself. Maybe because context-specific calls .nested()
twice.
To solve:
i'd suggest using always a wrapped NestedReader<PemReader>
with a stack:
stack: Vec<NestedOffsetLength>
Then, offset propagation won't be needed since self.position
is always valid.
pub struct NestedReader<'i, R> {
/// Inner reader type.
inner: &'i mut R,
/// Current nest bounds are at: stack.pop()
stack: Vec<NestedOffsetLength>,
}
pub struct NestedOffsetLength {
/// Allowed read bounds
/// Actually, only bounds.end gets used
bounds: Range<Length>,
}
And entire trickery with nesting becomes a bound check:
/// Checks if current nest range allows reading at least len bytes.
///
/// Returns error if there's not enough remaining data in the nested input.
fn advance_position(&mut self, len: Length) -> Result<()> {
let nest = self.stack.last_mut().unwrap();
let new_position = (self.inner.position() + len)?;
// Just a simple bounds check
// No need to do offset math
if new_position > nest.bounds.end {
Err(ErrorKind::Incomplete {
expected_len: len,
actual_len: self.remaining_len(),
}
.at(self.inner.position())
.into())
} else {
Ok(())
}
}
We can look into improving NestedReader
in the next breaking release to reduce the number of combinations that are monomorphized, however your original reported issue was it bloating your binary to 12MB, and I think it's unlikely to cause that.
Yes, it monomorphizes NestedReader
for each nesting level, and that's multiplied by the number of underlying reader types you're using, but the total combinatorial explosion isn't that large.
Even in a program which uses both SliceReader
and PemReader
, and nests to 10 levels, it would be monomorphizing 20 times. The input LOC for the entire nested.rs
module is 96. It's not a lot of code.
Regarding this suggestion:
i'd suggest using always a wrapped NestedReader
with a stack: stack: Vec<NestedOffsetLength>
The core functionality of der
(including read_nested
/NestedReader
) supports "heapless" no_std
targets which don't link liballoc, so it can't depend on Vec
.
The input LOC for the entire nested.rs module is 96. It's not a lot of code.
It's not many LOC, but #[derive(Sequence)]
generates a lot of code - impl for Decode
trait,
which is generic over that 10-times monomorhized reader.
I think the best way to address this would be to get rid of NestedReader
and make handling nested length boundaries an internal concern of each of the reader types, i.e. instead of Reader::read_nested
taking an F: FnOnce(&mut NestedReader<'n, Self>)
, it can just take F: FnOnce(&mut Self)
.
This would duplicate some logic but it would strictly limit monomorphization to bytes versus PEM.
Consider the following structure:
#[derive(Choice)]
pub enum Example {
ChoiceA(Null),
ChoiceB(Vec<Example>),
/// ... may contain more choices
}
It won't compile:
error[E0275]: overflow evaluating the requirement `SliceReader<'_>: Sized`
|
= help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`derdemo`)
= note: required for `NestedReader<'_, SliceReader<'_>>` to implement `der::Reader<'_>`
= note: 128 redundant requirements hidden
= note: required for `NestedReader<'_, NestedReader<'_, NestedReader<'_, NestedReader<'_, ...>>>>` to implement `der::Reader<'_>`
I've made a refactor of der
crate.
https://github.com/dishmaker/formats/blob/0fd6ddfc1a3000b704eb6fe994de9ef49d3a470d/der/src/reader/nested.rs#L133-L148
/// Method of NestedDecoder, which no longer implements Reader
pub fn read_nested<T, F>(&mut self, len: Length, f: F) -> Result<T>
where
F: FnOnce(&mut Self) -> Result<T>,
{
// Save current position
let old_end: Length = self.end_pos;
// Swap end boundary with current nest
self.end_pos = self.check_out_of_bounds(len)?;
let ret = f(self);
// Revert end position
self.end_pos = old_end;
// Return errors after resetting nested position
self.finish(ret?)
}
It works (tests ok), but i had to use lifetimes everywhere.
Especially lifetimes polluted the Decode
and DecodeValue
impls.
I've just found out that &mut NestedDecoder
is a double-pointer
That's sub-optimal for the CPU that de-references a &mut R
.
A triple pointer would be with current der v0.7.8
:
&mut NestedReader<'_, NestedReader<'_, SliceReader<'_>>>
just like &mut &mut &mut R
I suggest changing definition:
pub struct NestedDecoder<'i, R> {
/// Inner reader type.
inner: &'i mut R,
to:
pub struct NestedDecoder<R> {
/// Inner reader type.
inner: R,
That would also simplify lifetimes after my refactoring.
@dishmaker I already proposed simply getting rid of NestedReader
, instead changing read_nested
from a provided method to one each reader must impl (but they may be able to reuse parts of a common implementation).
It would be my preferred solution to this issue.
Unfortunately it's a breaking change and we're dealing with much more critical issues elsewhere in the project. I was planning on working on it in 2024.
I already proposed simply getting rid of NestedReader
Well, how would you track the outer nested boundary when entering another nest?
Then the Reader
trait needs to have something like:
fn set_nest_end(&mut self, end: Length)
fn get_nest_end(&self) -> Length
to keep track of outer bounds.
I use the function call stack to remember the outer one in my NestedDecoder
.
Also, i've just removed lifetimes from NestedDecoder
and implemented:
#[derive(BitString)]
pub struct MyFlags {
pub flag1: bool,
pub flag2: bool,
Well, how would you track the outer nested boundary when entering another nest?
In the reader type itself.
Well, how would you track the outer nested boundary when entering another nest? [...]
I use the function call stack
That was my plan as well. It can store some state internally, and keep other state on the call stack in the read_nested
method.
Then the Reader trait needs to have something like:
set_nest_end
/get_nest_end
Nothing like that is necessary, at least in the public API. It can all be an implementation detail of read_nested
.
Nothing like that is necessary, at least in the public API. It can all be an implementation detail of read_nested.
It is necessary, because read_slice()
and remaining_len()
relies upon current nest bounds.
Without wrapping reader inside another security/out-of-bounds structure, the reader may eat too many bytes.
Also, i've correctly implemented peeking by simplifying peek_byte
into peek_bytes
https://github.com/dishmaker/formats/blob/c5409e5a8d77a528b8810a43af1c3ed2545a1aa2/der/src/reader/nested.rs#L209
It is necessary, because read_slice() and remaining_len() relies upon current nest bounds.
Again, that can be stored in the reader type, which those methods are defined on.
Also, i've correctly implemented peeking by simplifying peek_byte into peek_bytes
I'm not sure your implementation is going to work on PemReader
?
Again, that can be stored in the reader type
You can't store u32
in a trait.
Forcing users to implement it on their own is painful.
Eg. what if I want to implement my HexReader
?
All nesting checks in read_slice()
are where then?
I'm not sure your implementation is going to work on PemReader?
I make sure there's always 8 bytes remaining to peek: https://github.com/dishmaker/formats/blob/c5409e5a8d77a528b8810a43af1c3ed2545a1aa2/der/src/reader/pem.rs#L119
You can't store u32 in a trait.
You store u32
(or rather, Length
), in a type which impls the Reader
trait, and query it via the accessor methods you named (e.g. remaining_len
), which as trait methods have access to the type's internal state. I'm not sure why this is confusing?
Forcing users to implement it on their own is painful.
This isn't a trait which is really meant to have a whole lot of downstream impls by other users.
Eg. what if I want to implement my HexReader?
It would be good to add HexReader
to the der
crate itself. Otherwise, I'm not sure what other Reader
impls would even make sense.
I make sure there's always 8 bytes remaining to peek
That's a similar approach used by peek_header
Otherwise, I'm not sure what other Reader impls would even make sense.
GzipReader, ZlibReader, ZstdReader, ZstdPemReader, AesDeflateReader etc..
And most importantly, my:
DebugSliceReader
which tracks a lot of things to make debugging huge structs/choices easier.
That's a similar approach used by
peek_header
But having both peek_byte
and peek_header
isn't elegant. Another function forced onto the trait user's impl.
GzipReader, ZlibReader, ZstdReader, ZstdPemReader, AesDeflateReader etc..
If you actually have legitimate use cases for these, feel free to open a separate issue, but it seems more like you are inventing use cases for the sake of argument.
But having both peek_byte and peek_header isn't elegant. Another function forced onto the trait user's impl.
They accomplish two different purposes. I’m not sure why you would even say this. You want to foist complexity onto downstream consumers for no reason?
for the sake of argument
Of course AesDeflateReader
is my imaginery reader, but someone might want to implement it in their crate.
They accomplish two different purposes.
Edit: moved to #1279
@dishmaker please open a separate issue for that, because it’s irrelevant to this problem
Hi!
My .exe file using
der
crate for a huge protocol is too large (12 MB for a simple DER encoder/decoder) I've just usedcargo bloat
andcargo llvm-lines
on my protocol crate and was surprised thatder
crate makes a lot of generic functions.read_nested
is actually the main cause!Because it is nesting itself recursively!
You can easily check by adding field
test: FailAsn1
inside your#[derive(Sequence)]
Not only that Reader gets generic-bloated but also all
Decode
andDecodeValue
traits.Really! Just look above,
DecodeValue
is generic overR: Reader
so all objects implementingDecode
also get bloated.But the main cause is that:
NestedReader
takesSelf
that can be anotherNestedReader
:https://github.com/RustCrypto/formats/blob/086ecd7d27aa3f6c5c97dc2a33adc7159008086b/der/src/reader.rs#L127-L131