pkcs7+der: mixed BER/DER encoding with undefined length

smndtrl commented 1 year ago

Hi,

while looking into the pkcs7 crate for a CMS usecase around the Apple world I discovered that their detached signatures use the BER indefinite length encoding for some of the SEQUENCE which of course is not supported by the der crate.

Has there been any thoughts/discussions around if that's something the formats repo should address or if that is out of scope (for now) and left to other crates.

I started with adding the signed-data content to pkcs7 validating against a DER encoded signature and discovered the problem afterwards with the ones Apple generates :( Link to example

Simon

tarcieri commented 1 year ago

We've talked about introducing a lax mode to the der crate which supports a limited number of BER productions necessary to interop with real-world use cases like this. So far there hasn't been one until now.

smndtrl commented 1 year ago

I gave it a shot with my limited experience with Rust and also this codebase. It is meant as a starting point.

I used it successfully with changes to pkcs7/CMS crate that are not yet ready/

tarcieri commented 1 year ago

@smndtrl rather than trying to change how the existing parsing works in the der crate, you can write your own "lax" implementation of Decode for a specific type in the pkcs7 crate

smndtrl commented 1 year ago

@tarcieri Wouldn't that require me changing how DecodeValue is implemented on Sequence? My PKCS7 data starts with 30 80 and that already makes the Length::decode fail without reaching the first DecodeValue code for ContentInfo in pkcs7.

For making indefinite length work a solution for Length and Readers is required. The ASN1 I encountered had indefinite lengths for SEQUENCE and OCTET STRING but also can be other fields that can be constructed in BER. For OctetStringRef I'm not sure how to handle those cases as it then is non-contiguous memory (or at least broken up with Tags/Lengths values. For Sequence it via the Reader.

I'm fine with carrying this patch on my own as it works well for my use case. I might revisit the suggested approach later on.

tarcieri commented 1 year ago

@smndtrl you'll need to write all the impls by hand if you go that route, and you will have to avoid impl'ing one of DecodeValue or FixedTag (I would suggest DecodeValue unless it needs to be in an IMPLICIT and context-specific), as otherwise it will receive a blanket impl of Decode.

If you can't figure it out I can take a look at some point based on your lax branch. It should be possible to impl the "lax" logic for just a single type.

bkstein commented 1 year ago

We've talked about introducing a lax mode to the der crate which supports a limited number of BER productions necessary to interop with real-world use cases like this. So far there hasn't been one until now.

We've found a real-world use case: EJBCA sends (degenerate certificates-only) CMS messages encoded with BER. Unfortunately, that's in line with RFC 5652:

   ...
   The CMS values are generated using ASN.1 [X.208-88], using BER-
   encoding (Basic Encoding Rules) [X.209-88].

carl-wallace commented 1 year ago

BER was an issue working on PKCS #12 as well. Exports from Firefox are BER. In a cert collection I use for testing, there are several several AIAs hosting P7s that are BER encoded.

tarcieri commented 1 year ago

We're aware this is a real issue. It's something I've been meaning to address.

tarcieri commented 9 months ago

I think we can probably use an approach like https://github.com/RustCrypto/formats/pull/810 but we need some way to control the granularity so enabling it doesn't cause der to revert to BER parsing globally

bkstein commented 9 months ago

FYI, I have implemented a simple parser/converter (Berder), that replaces all indefinite length specifications in a BER message by definite lengths. The parser recurses through the complete message (though there is a limit for recursion depth 🙂). It doesn't fix ordering, but as RustCrypto's der crate fixes it, I think it should work reliably. I tested it against EJBCA messages. The usage is simple. Just prepare your BER data before you try to parse it, e.g.:

    ...
    let der = Berder::ber_to_der(ber.as_slice())?; // Indefinite -> definite lengths
    let ci = ContentInfo::from_der(der.as_slice())?;
    let pki_message = SignedData::from_der(ci.content.to_der()?;

This could be implemented as Decode::from_ber(), but could also be integrated into Decode::from_der(). However, the latter would introduce some performance loss. What do you think?

tarcieri commented 9 months ago

I think we can probably implement Decode::from_ber in a way that doesn't require transcoding the entire document to DER first, using an approach similar to #810 (we have since added an IndefiniteLength type), and perhaps carrying the flag for using BER-like rules on the Reader, for lack of a better option.

bkstein commented 9 months ago

I think we can probably implement Decode::from_ber in a way that doesn't require transcoding the entire document to DER first, using an approach similar to #810 (we have since added an IndefiniteLength type), and perhaps carrying the flag for using BER-like rules on the Reader, for lack of a better option.

@tarcieri I have started doing that based on #810. I think I got pretty far with this implementation, but there are one or two issues left (correct handling of the end-of-content markers). The main idea was, to modify the der parser to do look-ahead evaluation of indefinite lengths. Whenever the parser encounters an indefinite length, it stores the current position and makes a "fast-forward" until the end of the current TLV is reached. Then, the length is "definite"ly known and we can resume parsing with the definite length at the stored position. This look-ahead must be recursive, even if e.g. an OctetString is parsed, because it might contain other (nested) TLVs with indefinite lengths. I have uploaded this to my repo, if you want to have a look at it. And as you mentioned IndefiniteLength: I changed der::Header:

pub struct Header {
    /// Tag representing the type of the encoded value
    pub tag: Tag,

    /// Length of the encoded value
    pub length: IndefiniteLength,
}

This implementation is not complete, but I think it is already more complete, than the current #810.

tarcieri commented 9 months ago

@bkstein I'd like to avoid making changes to the existing APIs like that. The idea would be to use IndefiniteLength as an encoding type to parse lengths in "BER mode", but then convert back to a Length in the public API.

However, I'd definitely like to also preserve the way the existing implementation works so indefinite lengths are rejected when parsing DER.

bkstein commented 9 months ago

Yes, the Header change would break compatibility. I saw, that signature's crates dsa and ecdsa access Header::length, so this would actually affect compatibility here. However, I introduced a flag is_parsing_ber in Reader. Is that approximately, what you intend with "BER mode"? In "DER mode", parsing is unchanged.

tarcieri commented 9 months ago

I was thinking an enum like Encoding::Ber which would improve clarity, I think, but yes that's the basic idea

tarcieri commented 7 months ago

I've bumped der's version to v0.8.0-pre.0 so we're now allowed to make breaking changes, which should make this easier.

I opened #1321 which adds an EncodingRules enum with Ber and Der variants, as well as a Reader::encoding_rules method to interrogate them.

Firstyear commented 4 months ago

@tarcieri For reference, we were interested in the BER capability as we want to unify our LDAP and Kerberos parsers on one trusted encoding/decoding stack. Currently we have a "ber only" parser, but we would like to move to this crate. I'll do some testing soon to see what we need for extending this to support BER in our cases.

RustCrypto / formats

pkcs7+der: mixed BER/DER encoding with undefined length #779